^ ED .213 725 



DOCUMENT RESUME 
I 



TM 810 943 



AUTHOR * 
TITLE 

INSTITUTION 
SPONS AGENCY 
PUB DATE 
NOTE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



ABSTRACT 



Apling r Riphard; Bryk f Anthony 

Policy Paper: The Predictive Validity of Early 

Childhood Variables. 

Huron Inst. , Cambridge, Mass. 

Department of Education, Washington, D.C. 

80 . 

84p. 

MF01/PC04 Plus Postage. 

♦Admission Criteria; Disadvantaged Youth; *Early 
Childhood Education; Educational Diagnosis; 
*Educationally Disadvantaged; *Federal Programs; 
*Predictive Validity; *Predictor Variables; Program 
Evaluation ' » 

*Elementary Secondary Education Act Title I 



^ Some early childhood variables are examined to 

evaluate their 4 predictive validity. The selection of children deeding 
early childhopd Title I Services is complicated by the lack of 
criteria for defining who is educationally disadvantaged and* the 
special problems of early childhood Resting and measurement. The 
study used re-analysis of longitudinal data on children in Head Start 
Planned Variation and Follow Through pragrams. The second approach , 
used meta-analysis to synthesize results* of studies that examined 
relat ionship$* between early childhood predictors and later outcomes. 
The strengths and weaknesses of these approaches complemented each 
other. Methods of s^pction and their predictive validity were the 
main focus of v the paper. Another factor to be Considered included 
costs of sel^ct^on* procedure; Special problems exist in assessing 
young children because tests for this age group, are often of lower 
technical quality; Preschool children often lack the physical, 
intellectual and emotional prerequisites necessary for systemati< 
assessment'. Selection- bias may result from the use *o£ tests or 
variables ybich, have different predictive validity for different 
groups. The importaftde of prediction stems from the goal of most 
ECT-I programs: the prevention of educational problems' in later 
schooling. (DWH). 



************** -k^Nc^fc ********* * ****************************************** 

* Reproductions supplied by* EDRS ar* the best that can be made * 

* from the original document. *. 
******* ********************** *******^t************** ************** ******** 



ERLC 



j 



The Huron Institute 



UJ . ( 



\ 



-A)lu-y IMnor:- ^ I'roJictiw ^iPditx - 
Of : h w\\ Cm KlhooJ \ari iMo^ 



US DEPARTMENT OF EDUCATION 

NATIONAL INSTITUTE COEDUCATION 

E-O'ji A T iONAl RE-SO'jR^f S fVOfWIATlON 



.if >j,m i» n.j i 

Mi' uf* M in.jt » M ti. ► ■r " i 1' t rT-[jr(i, 



P. jifi's f v w i if '«;)intons ->MN>'I ■ r> This r1' j( u 
mv (i|ii<i'iHitii's<iMilv r f '[>f»>st n! *»Mu uji Nl{ 



2 



9 

ERIC 



THE HURON INSTITUTE 123 MOUNT ALBURN STREET CAMBRIOGE^MASSACHUSETTS 02138 



Policy Paper: The Predictive Validity 
of Early Childhood Variables 



. ' By. 
Richard Apling 

Antfnlny Bryk 



The Huron Institute* 
12-3 Mt. Auburn Street 
Cambridge, Mass. 0213& 



Fall 1980 



^ Qraft Not for Citation 



ERJC 



3 



FOREWORD 



n 



This policy paper has been ^prepared ,as pa^t of a United States Education 

Department (USED) sponsored project on the evaluation of early childhood 

Title I (ECT-I) programs. Unlike the- reports and resource books which are 

other products of this endeavor, this paper is* intended for a limited audience 

namely, USED staff concerned with ECT-I programs and the evaluation of those 

programs. It is n<j>t intended as a practical 0 guid,e 'to states and local school 

districts on how to improve^ their ECT-I selection procedures. In fact, the 

paper deals, only with ,some technical issues surrounding the selection o'f ECT-I 

children: . - 

Deciding who receives ECT r I services is a complex multi-stage urocess 

•• • . tr 

that involves designating Title I attendance areas, identifying .children in 



xii 



need of ECT-I services, and selecting those most in need for ECT»-I program. 
This paper deals with the selection phase of, the process by examining some 
early childhood variables that could be included in a selection strategy 
with regard to their predictive validity their accuracy in predicting 
later educational outcomes. • • 



ERIC 



' TABLE arCSNTENTS ' / 
' * # , 'Page 

FOREWORD 1 

TABLE OF CONTENTS ' , . . ' : ' . ii 

LIST OF TABLES ( iii 

LIST OF FIGURES 1V 

OVERVIEW OF THE ^PROBLEM \j' ' ' ' ' 1 

SECONDARY ANALYSIS OF THt-HSPV AND FT ' LONGITUDINAL DATA 4 

' { 
The HSPV/FT Data Set " ■ 

Study Variables . . 

Data Analysis Strategy 

Benchmark R 2 s . ■ , 

JThree Sets of Prediction Variables 
\ Additional Data on Early 'Childhood Prediction 

] Measuring Misclassification to Assess ECT-I Prediction Strategies 
^Summary of the Findings fr6m the' HSPV/FT Data 

^ETA- ANALYSIS OF PREDICTIVE* STUDIES ....... 38 

t 

" Scope of the Analysis 1 
' Locating Studies 

Criteria for' Including Studies iji the Analysis 

Recording Study Characteristics - 

Analyzing the Results - 

The Findings of the Meta-Analysis 

The Predictive Validity 'of 'Teacher Judgment 

Limitations and "Caveats 

Summary of the- Findings from the Meta-AnalySisy 
CONCLUSIONS . . , - • • • • 63 

Some Other Considerations » 
Implications for ECT-I Selection Policy 

REFERENCE NOTES . . . . r , '. . : 71 

REFERENCES . . 72 v 



ill 



5 



LIST OF 



Table 



; TABLES 

f / 



1 Sample Background Variables .for HSPV/FT and NFT Children . 

2 Prediction and Outcome Variables for Cohorts III and IV 

3 Adjusted R s for Background Variables Alone and for 

All Variables Predicting Later-Grade Test Scores \„ , 

' t '* 

4 Predicting First-, .Second- , and Third-Grade Reading and 
Math Scores from Prekindergarten Tests and 'Background 
Variables, (Cohort III) ' 

5 Predicting First-* and Second -Grade. Reading and Math Scores 
From PrekiWdergarten Tests and Background Variables (Cohort IV) 

6 Predicting First-, Second-, and Third-Grade Reading and 
Math Scores from Kindergarten Tests ind Background Variables 

Y* Variables (Cohort III) • * ^ 

7 Predicting First- and Second-Grade Reading and Math £cor^s ^ 
From; Kindergarten* Tests and Background Variables (Cohort "IV) 



8 . 
9 

10 
11 
hi 

n 

14 
15 

16 

17 
18 

19- 



Median *R s for Thre'e Sets of Predictor Variables 

Median R s for Three Prediction Times 

Predicting Third-Grade Reading and Math from A Wide 
Range of Earl$ Childhood Variables 

Four Possible Results fromrCompa^ing Predicted and 
Actual Performance " * 

Misclassification Rates for Strategies Using Three Cut-off 
Scores to Predict' Third<-Grade Reading Scores . - 

Predictor and Outcome Variables Sought in Studies Assessed* 
'by the Meta-Analysis ^ 

Children's Ages and 'Corresponding Grades and Seasons- of th£ Year 

Average Correlations Between a Set of Background Variables and 
Reading, MatH, and Language Arts Scores for Three Outcome Times * 

Comparison" of Correlations between Background Variables and 
Outcomes 'from the Meta-Analysis and HSPV/FT-Data Sets (Cohort III) 

Tests as Predictors of Reading, Math, and language Arts Outcomes 



Page • 
9 

13 
16 

1? 
18 

19 

21 
22 
29 

35 

35 

40 

43 . 
• ' 49 

. 'SO 



Average Correlations Between Tyjes of Predictor Tests and 
Later-Grade Outcomes Unadjusted and Adjusted 1 for Time 



ERIC 



Teacher Judgment as. a Predictor of Reading, Math and* Language 
. ' Art? 'Achievement' * » 

u 

20 /'Correlations Between Magnitude of Pearson's r and Selectecfstudy 
" Characteristics, for Studies Reporting Relationships Between Early 
'tai»ldhood*TeIts,and Reading, Math, and Language Arts Achievement 

• " ' ' V : '.. \ ' • afi . ' - \ 



•54 

57 
61 



.A 



LIST OF FIGURES 



4b 



Figute 



1 Test Administration for Cohorts III an<3 IV. 



Median R s for Three Sets- of Predi 



ictor^ 



Variables (Cohort III) 



2 s 

3 4 Median *R s for Three Sets of Predictor Variables CCohort IV) 

, s 

,4 Median R s. fox Reading and Math Outcome Measures and All 

Predictor 'Variabl%s (Cohort* III) ^ 

5 Median R s for "Reading and Math Outcome Measures and All 

Predictor Variables (Cohort IV)- - h * 



s 

Page 
* 6 
'"23 
24 
25 

26 



' K 



' ■ J 



\ . > 



-si' 



/ 



ERJC . 




IV 



( 



7 



r 



OVERVIEW QF THE PROBLEM V 



Because, of our field work Q'urchak, Gelberg, § Darman, 1979; Yurchak § 

r 

Bryk, 1980) and continuing conversations with USED staff, it became increasingly 
clear to us that the selection of, children in need of ECT-I services presents 
special problems. These include the la,ck of criteria for defining who* is educa- 
* tionally disadvantaged; disagreement on what constitutes disadvantage before 



school entry, ^and the special* problems of earlv childhood testing and measurement. 

• . ' 7 • . • • 

De'spite these complications, the Huron descriptive stift^t of ECT-I programs ' 
(Yurchak SVyk, 1980) found th^t most" Lodal Education Agencies- (LEAs) / are 
making.. a* genuine attempt to fulfill not only the letter but al'so the intent 
of' the' lawregarding ECT-I selection. The LEAs visited expressed n strong 
interest . .in the need to find, better ways to conduct . .> . selection'' 
tp, 6-15). 

The Huron study found that school districts used a wide variety of in- 

J ' * . " " 

dicators .to selecr ECT-I children, including:, * • 

• , A .lo^score. on^ a t!est or series of tests 

• Tether judgment 'l^/ t 4 

• A sibling wKo is or was a Title I student • * < 
Parents with less than a„ high school education - v 

• A child's inability ~Co und^stand the language of instriktion 

• -Parent judgment". i 

... ^ #* % * y 

Although in almost. every district tests were ueed in the ECT-I selection 
process, their importance varied enormously (Yairchak § 'Br^k, 1^80). At "bne 



extreme, test scores were the sole determinant of who received ECT-I services. 

* • • • ; ^ 

Less extreme was" the. practice of considering tests results together with 



teacher judgment.* kt the other extreme, tests were given to comply with' 

regulations but were not taken into account in selectiorudecisions . In 

addition -to the different ways in which tests were used, the Huron study* 

found that many different tests were used in the 'districts studied. In all, 

we. found 26 tests u£ed for ECT-I .selection in the 29 LE\s we visited. Only 

a few of these were used in more -.than 'one LEA. > 
V ■ 

The Huron study revealed widespread dissatisfaction with ECT-I selection 

practices. Some local .and state staff ' were ' especially concerned about the 

inadequate quality of measures used to select children. Qthers expressed, 

disrtiay about the inability to measure, important attributes such as social 

and emotional development, task_ persistence, and the attention span of 

young children. Those interviewed generally agreed that EQT : I -programs are 

aimed at the long-term goal"of promoting general^ schocrl competence in the* . 

% 

!' • ' , 

early elementary grades, and thus must provide the necessary .precursor skills 

Unfortunately, however, there is little agreement on what those skills are; 

1 N ' r j • 

• therefore there can be little agreenfen^. on what areas should be covered in 

an assessment battery. " ' 

,The study reported here is an attempt to inform discussion of ECT-I 
■j> * 
selection procedures. Since a major goal of most ECT-I programs is to pre- 

♦ — ' & * * « 

'vent problems from occurring when a child reaches elementary school, it 

0 • 

follows that an adequate ECf-I selection procedure must be able to predict 



* Such a combination of tests ^scores and teacher judgment was recommended in 
the evaluation of the Washington ,T0. G. , JitfevI program (Stenner, Feifs, 
Gabriel, $ Davis, 1976). The evaluates -found. "that a substantial number of 
eligible students are not being identified, . . . [and] a number of students 
not needing Title I segrvices are-, on the basis of faulty test. scores, being 
placed in the Titde I program" (p. 5). They therefore recommended* that "the 
exclusive reliance on standardized tests should be discontinued in favor of 
a 'need index', computed from a weighted 'composite of teacher judgment and 
criterion-referenced test scores* (p.' 7). 

u • 



.-3- 



'whdch children ar.e most likely to experience later difficult^ so that they 
may receive ECT-I services., An important • criterion for assessing any ECT-I 
selection procedure is thus^its predictive validity. '~ *• % 

'An ideal study comparing the predictive validity of possible ECT-I 
selection procedures would haye several attributes f^nJU would assess ^ large 
Aumber of children at an early age using diverse predictors such as early . 
childhood tests, socio-economic variables (for example, income and mothers' 
education), home characteristics such as how much parents read" to their 
children, and teacher judgment. The study would follow these children until 
they reached early elementary school. They would then be assessed on general 
school competence in terms of school grades, achievement test scores, teacher 
judgment, attitudes toward school, and so forth. Alternative selection 
procedures^, consisting of different combinations of these predictor variables, 

r could then be compared. for their relative predictive validity for later 
achievement test scores, future school grades, etc. 

Unfortunately the ideal study for our purposes does not exist, nor is__ 
it likely to be done. Thus we havfe resorted to two imperfect but useful 
approaches. The first is a re-analysis of longitudinal data on children in 
Head Start Planned Variation and Follow Through programs, which approx&iyate 
some characteristics of an ideal study.. This reanalysis allows us to look 
at several combinations of variable^ for predicting later achievement. The 
data set has the advantage of including longitudinal data onjp. substantial 



ERIC 



number of children. It is limited, however, in not includi^' potentially 
important variables such as teacher judgment and in having lihited information 
on family characteristics. 

The second approach uses meta-analysis to synthesize findings from studies 
that examine relationships between early childhood predictors and later 



outcomes. The meta-analysis combine's a wider variety of predictor variables 
and outcomes; but, because these data come from scores of studies, it is im- 

H 

possible to examine different sets of predictors simultaneously. Thus the 
strengths and weaknesses of our two^ apprQaches complement each other. 

*v • 1 . ' . 

SECONDARY ANALYSIS OF THE HSPV AND FT LONG I TUB I NAL DATA 

The data on children ,#]v*Head Start Planned Variation (HSPV) and Follow 
Through (FT) programs that we re-airalyzed were originally assembled by Weisberg 
and Haney (1977) to evaluate the cumulative effects of these programs. Because 
this data set contains background variables^ prekindergarten and kindergarten-, 
test scores, and later achievement test scores for several hundred children, 
it is useful for assessing the predictive power of multiple variables. In 
the remainder of this section we will describe this data set,* discuss how ^ 
we analyzed the data,' and report our results. 
The HSPV/FT Data v Set • * 

The data" on the two progra'ms were merged to investigate "whether Follow 

Through helps maintain the bepefits of Head Start in the early elementary 

grades;* [and] # the way in which Head Start experience of children may have 

confounded efforts in t\?e ^national evaluation £>f Follow Through to calculate 

program effects" (Weisberg § Haney, 1977- , p. i) . As Weisberg^ancLJiartey 

point out, this data set is probably unique. 

. To our knowledge, these files represent' the only data set 
with information on tHe, experience and development of children 
from HS entry through tne end of third grade. While it is 
in many respects* painfully limited, it represents a unique 
% source which required a considerable effort to create and 

may be of interest for purposes of secondary analysis. (P. 11) 



* For a more comprehensive discussion of the data and of the original study t 
see Weisberg ano Haney '(1977), 



Like many longitudinal data sets, the HSPV/FT data is "painfully limited 11 " 

in several ways. For one, variables are inconsistent across groups * two' 

cohorts af children werS followed from prekindergarten thrpugh early , elementary 

school,* but the>\ received few tests in common, 'There are alsq inconsistencies 

within cohorts; for example, different versions of the^&i^well Preschool 

Inventory (PSI) were used by the two programs for cohojrt III. In addition, 

these are not data from random samples of children.- .As Keisberg and* Haney 

\< 

w 

(1977) point out', this is a "special sample produced by a complex selection 

r • A 

•process: , . v 

Thp flow of children into, through, and out of Head Start 
and Follow Through- constitutes a vast and complex process. 
Chifdren 'were selected for Head Starf, on the basis of general 
criteria applicable nationally, * but local circumstances de- 
termined- the specific make-up of program groups.- Thus groups 

A of Head Start children in different places yary widely on 

numerous dimensions. In Follow Through, too, the likelihood 4 
of participation depends on children's characteristics and ( 

* local circumstances . Moreover,, Head Start experience is one' 

, of the factors taken into accoQnt in the selection process.' — 

(P- 24 ) x 
As with all longitudinal studies, attrition -creates problems with the " . 

.data. Some children, although they remain ip the sample throughout the 

study, inevitably are absent when some tests are gd^ven, and tfi^e data ar^' 

lost. Similarly, other childVen leave the program, mpve to other schools, 

i 

* t « « 

or for other reasons .are unavailable for subsequent data collection. And 

children leave as they entered the study: in non^random patterns 'that make 

generalization to large groups difficult. As Table 1 shows, the usable 

samples were about half of, the original cohorts. 



* Figure 1 shows the years and seasons of the years when tests were adminis- 



tered to the two cohorts of children, 



* 4 



1 ^ 



\ 



Cohort 
+ 

III 



1970-71 1971-72 . 1972-73 1973-74 1974-75 

Fall Spring Fall Spring Fall Spring Fall Spring Fall Spring 



HSVP. 

* * 



/ 



Cohort 
IV + 



HSVP 




* Test administration times 



+ Follow Through cohorts, l\o Head Start datawere available for children in 
Cohorts I >and II; therefore these groups are excluded from the analysis. 



Figure 1: Test Administration for Cohorts III and IV, 
(Adapted from Weisberg and Haney* 1977, p. 6) 



V 



Table lr Sample Background Variables for HSPV/FT and KFT Children 



\ T~ 

Sex 

Ethnicity 

Average Family Income 

Median Father's Education 

Median Mother's Education 

Father's Occupational 
Status 

Family Receives Aid 

First Language 

Sample Size / 

Approximate Usable Sample 
Size 



Cohort III 



Cohort IV- 



Non Follow 
Through * t 



57% boys 
45% nonwhite 
$3700 (197(5; 

Grade 10 

Grade 10.4 

12% unemployed 
S0% yes 
95V English 
396 ■ 

200 



52Y/tf6ys t 
49% nonwhite 
$3700 ^(1971) 
Grade 10.3 
Grade 10.5 

1 9 e 6 unemployed 
40% yes 
96% English # 
725 

400 



51.2% boys 
64% nonwhite 
$5900 



Grade 11.6 



94% English 
8676 



* Data from Molitor, Watkins, and Napier, 1977, p. 12. 



ERIC 



14 



Finally., the -children in the HSPV/.FT sample- are probably more disadvan- 
taged than the pool of children -from which ECT-I participants are /chosen. 
Table 1 summarizes several background variables for the HSPV/FT children 
and the Nop Follow Through (NFT; ; control group, which was made up mainly of 
children from Title I schools (Kaney, 1977, pp. 165-166). Clearly the two ' 
samples differ significantly in minority enrollment, family income, and 
mother's education.* 

Despite these problems, the data are' a unique resource for examining # 
the predictive validity of ECT-I variables > They are valuable for our purposes 
.becaus'e they follow children frr-n preschool ^through second and third grade. 
For the subsample of children f:r whom complete records -are available, we can 
easily examine the comparative predictive power of different groups of variables 
In addition, children in. the sarple do not come from just one area but are 
drawn from 13 Y\ sites .in 11 states (Minnesota, Utahi Washington, New Jersey, 
Nebraska, Delaware, Missouri, Illinois, Colprado, Florida, and Pennsylvania) 
that represent a geographical diversity. 
Study Variables 

« 

Table 2 lists the independent and dependent variables included in^ the 
analysis of the HSPV/FT. data set. Outcomes are total reading and total math 
scores of the Metropolitan Achievement Test (MAT). This test was given in 
the spring jto the first, second, and third gfades of Cohort III and to the 
first and second grades of Cohcrt IV. All outcomes are raw scores. The 
same background variables were collected for the two cohorts. -The original 
data set had several other bacl.jround measures, excluded here because of- 
large numbers of missing cases :r high correlations with other background 



Table 2: » Prediction 


and uutcome variaDi.es tor Lonorxs 111 <±nu iv 


V 




Cohort III 


♦ Cohort IV . 

Background Variables 


* r 

Sex 

Age s. 
Ethnicity 

Total Family Income 
Mother's Education 
Family. Receives Aid 
Number in Household 
First Language in Homp 


Sex 

Ape , t 
Ethnicity . 
Total Family Income 
* Mot-her's Education 
Family Receives Aid 
Number in Household* 
First Language in Home 




" Prelcindergarten Tests 



PSI (Fall) 








PPV (Fall) 


PSI (Spring) 






* 


- PPV (Spring) , 


NYU Booklet 


3D 


(Fall) 




PSI (Fall) 


NYU Booklet 


3D 


(Spring) 




PSI (Spring) 


NYU Booklet 


4A 


(Fall) 




WRAT Reading (Fall) 


NYU Booklet 


4 A 


(Spring) 


WRAT Math (Spring) 










WRAT Numbers (Fall) 










WRAT Numbers (Spring) 



Kindergarten Tests 



PSI (Fall) - . 
WRAT Reading (Fall) 
WRAT Reading (Spring) 
WRAT Spelling (Fall) 
WRAT Spelling (Spring) * 
* WRAT Math (Fall) 
WRAT Math (Spring) 
PPV (Fall) 
PPV ^Spring) 

Lee-Clark Reading Readiness (Fall) 

MAT Primer Reading (Fall) 0 

MAT Primer Numbers (Fall) 



MAT Primer Reading (Sprirg) 
MAT Primer"Numbers (Spring) 



Outcome Variables 



MAT Primary I Total Reading (1st grade) MAT Primary I Total Reading (ls x t gr) 



MAT Primary I Total Math (1st gr) 
MAT Primary -H- Total Reading (2nd gr) 
MAT Primary II Total Math (2nd gr\ 
MAT Elementary Total * Reading (3rd gr) 
MAT Elementary Total Math* (3rd gr) 



MAT Primary I Total Math- (1st gr) 
MAT Primary II Total, Reading (2nd gf) 
MAT Primary II Total Math {2nd gr) 



MAT - Metropolitan Achievement Test 
PSI = Caldwell Preschool Inventory 
WRAT = Wide Range" Achievement Test 
PPV ^-Peabody Picture Vocabulary 



IB 



variables.* As Table '2 shows, there is little similarity between prekinder- ' 
v garten and kindergarten tests in the two cohorts. Only ihe fall PSI i$ common 

to both among th$ prekindergarten measures. Subtests of the MAT Primer are 

v J 

included in both sets <5f kindergarten predictors, bu^ were given in different 

seasons in the two cohorts. Although this variability ir. the two sets of 

/ 

predictors make's it difficult to compare the two^.qohorts. such comparisons, s 

when feasible, bring additional information to our) study. 

Data Analysis Strategy 

The central question for analyzing the HSPV/fT data is whether combinations 

of early childhood variables do better than single variables in predicting 

* * 

• * 

' problems in later school experience,, In examining this question we looked 
at three procedures for predicting later achievement: 

• Using an individual test or subtest 

• Using a set of tests or subtests 

• Using a set of tests ^and background variables. 
We examined each procedure in two ways* First, we, used rultiple regression 
tc^generate R^s (the percentage ©f variation in outcome variables explained . 
byj individual variables and by sets of variables) . Then we determined 

"''which individual test or subtest accounted for mast variation ira^afi , 

outcome. Next we added other tests or subtests that contributed significantly 
to the prediction of later achievement scores. Finally ve added a set of 
background v^dab^es to the set%£ tests. By measuring increments to the 



* In Cohort III these include Present Family Income (high correlation with 
Family Income), Father's Education (199 missing cases) j Father's Occupation 
* (156 missing), Father's Employment Status' (164 missing), Mother's Employment 
^•Status (164 missing). In Cohort IV, 'variables dropped^re Father's Schooling 
(358 missing), Father's Employment Status (312 'missing) , and Second Language 
(675 missing) . 



V 

R 2 s as we added successive sets of variables, we could compare the predictive . 
power of the three procedures. 

In addition to examinin g/ increment S to R 2 , we also .analyzed the number . 
of misclassifications produced by procedures predating third-grade reading* 
spires. Misclassif'ication results when a procedure predicts either that a 
child will experience educational disadvantage and he does not /or' that a child'" 
will not experience disadvantage when in fact he dog?. Thus we have a second 
way to c^pare prediction procedures: what are the rates of misclassif ication 
that result from each? In this ' subsection, we will discuss the multiple 
regression analysis and report our findings; in the next, we will explain 
our analysis of error rates and examine those results. ^ 

* % In designing-the multiple regression a^nalysUs of the HSPV/FT data, we • ■ 
- decided tb analyze cohorts III and IV separately because different early 

childhood tests are used with the two cohorts. We chose to do separate , 
v analyses for hiPCT reading scores and NJAT math scores because later Title I 
progVams are often aimed at ameliorating either reading or math'problems . 
•.We also decided -to analyze separately predictors measured in the fall and 
in the spring. This resembles ECT-I procedures in that a program, might use 
either spring or fall data to select children. . ' ^ 

Benchmark R 2 s ' \ 

To^ compare the predictive 'power of single tests and sets of variables, 
we first determined benchmark R 2 s by seeing how well background variables 
alpne predict reading and math scores in grades 1, 2, and 3, and by examining 
how well all available. variables predicted the, same outcomes. These two 
groups of R 2 s provide reference points by which to judge how well other 
combinations of variables predict^ achievement test- scores. 



-12- 



Table 3 shows the results of our analyses with. background variables 
and with all • variables for each outcome measure. For each outcome in the 
twp cohorts, we entered all background variables listed in Table 2 as in- 

v * 

dependent variables in a stepwise regression. We stopped at the step for 

which all variables entered with F>1.00. This cutoff -rule ensures 

that random variation was not added to the prediction equation. The first 

2 

row for background variables in Table 3 reports that R r at the last step 
for which F was greater than or equal to one. Each R^ (iw this and other 

. / < 

tables from the HSPV/FT analysis) is adjusted for sample size and for the 
number of variables in the prediction equation. (See Cohen § Cohen, 1975,' 
pp. 106-107, for a discussion of adjusted R .) 

We followed a similar procedure for examining 'the predictive power of 
all variables. The background varialJtes, fall preki^ergarten tests, and 
fall kindergarten tests lifted in Table 2 were entered as independent 
variables in stepwise regressions. The outcome var^bles in Table 2 were 
the dependent variables. The same cutoff rule (F>1.00) was used to decide 
when to stop adding variables. The analysis was then repeated usitig spring 
prekindergarten and kindergarten tests. The result can be seen" in the bottom 

half of fable 3. — ' 

I > * x ' 2 

We conclude seve^l things from Table 3. First, the overall R s with ^ 

V " * 2 

all varrafeJ^es in the equations are substantial. For fall predictions, R s 

range from 0.32 to 0 ; 62, and for spring prediction, from 0.38 to 0.61.* Ap- 
parently, a set of background variables and early ^childhood tests account 

for significant amounts of the variation in later scores. 

r . / 

Looking agap.n at Table 3, we see that background variables account for 

- y A 

some, but not a great deal,* of later test variation. This is not surprising 



ERIC lu 



-13- 



■ Table 5: Adjusted R s for Background Variables Alone and for 
s All Variables Predicting bater-Grade Test Scores 



. Background 
Variables 



^^AT 
Read. 

1 



.11 

202 



Cohort III 



MAT MAT 
Math Read . 

1 2 ' 



Cohort IV 



MAT MAT MAT MAT MAT - NUT 
Math' Read. Math Read. Math Read. 



.17 



' V. 

.14 



13 v.15 *.12-' 



1 



1 



,18 * .11 ' .11 



200 169 



166 



136 129 i 473 469 415 



MAT 
Math 



.12 
411 



All* ' . 
Variables 

Fall 



.44 
141 



.62 
139 



.3/ .49 
122 121 



.36 .52 ..39 .32 .40 -.35 

i ' 

97 94 ; 455 432 383 582 



Spring J .49 .59 ,59 ' .45 

1 
1 

n 1 137 137 117 115. 



93 



.40 .61" .57 .54 
91 384 381 347 



.54 

343 



\ 

Includes all prekindergarten and kindergarten tests and background variables 
that entered the prediction equations at F=1.00 or more. 



9 

ERIC 




-14- 



since- the background variables ' in the HSPV/FT data are fairly crude measures. 
Prediction might have been improved if ..we had also had measures of home 
environment and parent-child interaction. Truncation cf these background , * 
variables may also explain their modest predictive power. The children in ^"N 
our sample were selected for HSPV and FT. programs on the basis of socio- % 
economic measures. Thus this sample is- more uniform than children in general 

on variables such as family income and mother's education, arid this contributes 

> 2 

to" lower correlations and lower R s.. » 

- '• . • r . 

" Table 3 presents two extremes against which to coi^are prediction pro- 
cedures. Using only simple background variables we explain roughly l\per- > 
cent of the variation in later scores. ,Using all the variables at our dis- 
posal, Khich we^ would not expect any ECT-I program to :^ve available, between. 
35 and 45 percent is a reasonable expectation. *Othe"r combinations of predictor 
variables, considered below, fall between these two extremes. 
Three Sets of Prediction Variables 

-We used procedures similar to our analyses of background vvariables and 
all variables to examine the 'predictive validity of three ^ternative sets 
of variables,-- a test or subtest used alone, a group of 4ests or subtests 
added to the single" instrument", and background variables a<fted to the set 
of tests or subtests. We performed separate analysed 4 on tests given in 
prekindergarten and in kindergarten. We separate^ analyzed tests given 
in the fall and in the spring. We also analyzed the outcomes separately— 
first-, 'second-, and third-grade reading, and first-, second-, and third- ^* 
grade math/ Finally we analyzed data from the two cohorts separately. m 

For each combination of predictor test time (e.g. t kindergarten, faPll), f| 

* • « 

' outcome measure and time {e.gv, first-grade reading), and cohort, we followed 



ERIC 



21 



the same analytic procedures. First we- entered- all ajjpropriate tests and 

subtests (e.g., fa^l kindergarten measures) as indep§ndeht measures in a 

* «■ 

stepwise regression. We ue^d the F> 1-00 rule to determine the' best set 

*• • * > ' * 

of tests or Subtests. We then performed several regression?, *in turi^entering 

each test or subtest from the be^t set first. The test or*subtest that m 

2 «# 

produced the highest R ,was designated as the best T^ndiyidua^ measure. We 

j « ** 

then added the remaining tests or subtests tq the best measure. .Finally we 

added all backyspund variables stepwise after enforcing- the besTt < set of tests/v 

into the prediction equation. Once again, we stopped adding background 



variables just before F dropped below 1.00./. 

Table^ 4,j 5, 6, and. 7 contain the results from\tfYfcse aijaly 



* s 

« ' '• 4 ■ / 

;e analyses. ^ Tables / 
4 and N 5 Ihow results for each childhood test administered to grekindefgarteners 
Tables 6 and 7 contain test results for kindergarten chil^sen. .Tables 4 and 



6 are taken from cohort III; 5 and 7 are from cohort IV. Each table 'is read 

A . 

in the same.wey. The first row of numbers presents 'the adjusted JR's^for the 
» * •» ** 

single prekind^pgarten or kindergart,eji test that best predicts#the outcome^ 

• • - 2 

shown at the top of each column* The next row,>$hows the R s ''and increments 

to R when other prekindergarten or kindergarten tests are added to the 

2 * ' ; - " 

prediction equation. The final row shows R s ^nd increments when background 

r ' 

variables are added to the best set of tests or subtests. 

There are several patterns in Tables 4 through *7 .that ar^ partially 

maskec^-because of the amount of data presented. To help clarify these 

2 

jattern^ we have calculated median R s.. These medians are presented in 



v 



Table 4: Predicting Hirst-, Second-, and Third-Grade goading and Math Scoros 
From Prokindorgarton Tests and Background Variables (Cohort III) 



Outcomo Tests 



MAT 

1 Reading 


v MAT 
1 Reading 


MAT 
1 Math 


MAT 
1 Math 


MAT 

2 Reading 


MAT 

2 Reading 


MAT * 

2 Math 


MAT 
2 Math 


MAT . 

3 Reading 


MAT 

3 Reading 


MAT 
3- Math 


MAT 
5 Math 


Fall 


Spring 


Fall 




Spring 


Be 

Fall S 


st Test or Sub 
Spring 


test 
Fall 


Spring 


s f 

Fall- 


Spring ^ 


Fall 


Spririg 


PSI 

1 c 

144 


NYU 

Booklet- 4/ 
144 


fsi 

142 


NYU 

Booklet 4A 

f V 

142 * 


PSI 

p 

1 X 

124 


NYU 

Booklet 4A 

1 c 

126 


PSI 

. 14 
123 


>PSI 

. 19 
126 


PSI 

. 14 

9*. 


NYU 

Booklet 4A 
« 

."19 
101 


PSI 

.09 
95 


NYU 

Booklet 
4A 

. 17 


























PSI 

. 15 
144 


PSI 
' NYU 4A 
NYU 3D 

.26 
.01 
144 


PSI 

• - * 

.25 
142 


PSI 

NYU 4A " 
NYU 3D 

.38 
.03 
142 


PSI 

.13 ' 
124 * 


PSI 

NYU 4A . 
NYU 3D 

.16 
.01 

126 


PSI 

.14 

123 


PSI 

NYU 4A 
NYU 3D 

.23 
.04 

125 


PSI 

. 14 
98 


NYU 4A 

.19 
101 


PSI 

*3p9 
95 


•psi 

NYU 4A 
NYU 3D 

.18 
.01 

97 




*- 


U 




Background Va 


riables Added 


to Best Tests 












Sex 

Ethnicity 
Mom 1 Ed 
Age 
Income 
Faro. Size. 

.20 
.05 
144 


Sex 
Income 
Fam. Size 
Mom Ed. 
Age 

1 

.29 
.03 * 
144 


Ethnicity 
Receives Aid^ 
Mojn Ed. 
1st Lang. 

£ 

4 

.37 • 
- .12 
. 142 


Ethnicity 
T!om Ed. 
Income 
1st Lang. 

.44 
.06 

142 


Sex 

Ethnicity 
Receives Aid 
Fam. Size 
Incom.^ 
Age^ - 

.21 

.08 o 
124 


Sex « 
Ethnicity 
1st Lang. 
Lncome 

Fam. Size 
Receives Aid 

.21 

,05 ' 
126 


Ethnicity 
Receives Aid 
Mom. Ed. 
Mom Qcc, 

.24 
.10 

_1_23 


4 

Ethnicity 
Mom Ed. 
Receives Aid 
1st Lang. 

.29 
.06 


Income 
Fam,, Size- 
Age 

Receives Aid 
Ethnicity 
Mom Ed 
Mom Occ. * 

.21 

.07 
* 

9<L — - . 


Income 
Fain, Size 
Mom Occ. 
Age „ 
Mom Ed, 
t 

.24 
.05 

101 


Rcc. Aid 
Ethnicity 
Mom Ed<. 

.17 

.06 

95' 


Rec. 
Mom Ed 
Ethnic. 
Mom Occ 

.23 
.05 * 



Test 



Ad j .V 



Tests' 



Adj.R 2 
Inc.R 
n 



Vari- 
ables 
Added 



Adj.R 
Inc.R 
n 



23 



24 



erJc 



(Tat 



Test 

n 



Tests 



Adj .\V~ 
Inc.R' 



Variables 
Added' 



Adj .r; 
Ine.I^ 
n 



bWPt : MWdI(JB^^ FlEWB- aMUecoMWlra JPWsadlflP^an JMfth~~JI^Wes PWWi HH 




"25 



Prekindcrgartcn Tests and Background Variables (Cohort IV) 



Outcome Tests 



MAT 

Reading 
1st Grade 


MAT 

Reading 
1st Grade 


MAT 
Math 

'1st Grade 


MAT 
Math 

1st Grade 


MAT 

Reading 
2nd Grade 


MAT 

Reading 
2nd Grade 


MAT 
Math 

2nd Grade 
"PK Fafl 


MAT 
Maljh • 
2nd Grade 
PK Spring 


!>K Fa 1.1 


PK Spring 


PK Fall 


Best Test or S 
PK Spring 


ubtest 

PK P:ill 


PK Spring 


PK Fall 


— 

r 

PK Spring 


WRAT 
Reading 

.28 


WRAT 
Reading 

; .49 

\ 423 


PSI 

.20 

432 

fit 


PSI 

.36 
' 407 


PSI 
.28 

383 I ■ 


WRAT ^ 
'Reading 

.42 « • 

376 


•PSI 7 

'27 
382 • 


JVRAT 
Reading 

.35 

- ] . 






Best S< 


:t of Tests- an 


d Subtests 








PS I 

Peabody 
WRAT Read 
WRAT Num. 

.35 

.07 

433 


PSI 

WRAT Read 
WRAT Num. 

.si 

.04 
■v 

408 


Peabody 
PSI 

WRAT Read 

.31 
.05 

4 32 


PS"I 

WRAT Read 
WRAT Num. 

.44 

.08 

407 


Peabody 
PSI 

WRAT Read 
WRAT Num. : 

.36 

.08 

383 


PSI 

WRAT Read 
WRAT Num.. 

.48 

.06 

360 


PSI 

WRAT Read - 
WRAT Num. 

.32 

.05 

382 '" 


Peabody 
PSI 

WRAT Read 
WRAT Num.- 

.43 

.06 

348. 




% 1 


Background Vai 


•iablcs Added ' 


:o Best Set of 


Tests 






Rec. Aid 
Sex 

1st Lang. 
Income 
Pam.. Size 

" . 39 
.02 
433 


Sex 

Receives Aid 

Ethnicity 

Mom P.d. 

Age 

Income 

1st Lang. . 

.56 

.03 

408 ' 

* 


Income 
Ethnicity- 
1st Lang. 

.32' 
.01 
432 

♦ 


Ethnicity 
Income 

.46 
.02 
407 


1st Lang. - 
Income 
Family Size 
Age- 
Sex 

Receives Aid 

.40 
.04 

383 


Mom Ed. 
Age 

Mom Occ. 
Sex 

Income 
Family Size 
Ethnicity 

,51 ' 

.03"\ 

. 360j " 


Income 
1st Lang. 
Sex 

.35 
.03 
382 


Income 
Ethnicity 
Mom Occ. . 

.45 
.02 
348 

2n 



Table 6; Predicting First-, Second-, and Third-Grade Reading and Math Scores 
From Kindergarten Tests and Background Variables (Cohort HQ 



Outcorao Tcs*s 



6 

* 


MAT 

1 Reading 


MAT 

1 Reading 


MAT 
1 Math 


MAT 
i Math 


MAT^ 

2 Reading 


MAT 

2 Reading 


MAT 
2 Math 


MAT 
2 Math 


MAT > 
3 Reading 


MAT 

3 Reading 


MAT 
3 Math 


MAT ' 
3 Math 






























Fall 


Spring 


4 

Fall 


Spring 


Fall 


Best Test 
Spring 


or Subtest 
Fall 


Spring 


Fall 


Spring 


Fall 


Spring 


Test 

2 

Jl 


WRAT 
Reading 
.33 
202 


WRAT 
Reading 
.39 
189 


MAT Primer 

Numbers 

.45 

197 


WRAT ^ 

Math 

.46 

188 


Wrat 
Reading 
, 23 
168 


WRAT 
Reading 
. 28 
157 


MAT Primer 
Numbers 
. 32 
165 


WRAT 
Math 

. 30 
154 


WRAT 

Reading 

. 22 
136 


WRAT 

Reading 

. 29 
131 


MAT Primer 
Numbers 

ill 


WRAT 
Spelling 












Best 


Set of Tes 


ts or Subte 


sts 










Tests 

Adj. id 
Inc.R. 
n 


PSI y 

WRAT Read 

WRAT Spell 

Peabody 

Lee-Clark 

MAT Read 

MAT Numbers 

.41 

.08 " 

199 


WRAT Read * 
WRAT Sp 
WRAT Math 

-a 

.46 
.07 
189 


WRAT Read 
WRAT Sp 
WRAT Math 
Lee-ClarR 
^MAT Read 
MAT N 
.56 
.09 
197 


WRAT Read 
WRAT Sp 
WRAT Math 

.54 
.08 
188 


WRAT Read 

Peabody 
MAT Read 
MAT N 
.32 
.09 
168 


WRAT R 
WRAT Sp 
WRAT M 
Peabody 

.36 
.08 
149 


PSI 

WRAT R 

Lee-Clark 

Peabody 

MAT R 

MAT N 

.41 

.09 

165 


WRAT R 
WRAT Sp 
WRAT M ' 
Peabody 

.40 
.10 
146 


WRAT R 

Peabody 
MAT R 
MAT N 
.32 
.10 
136 


WRAT R 
WRAT Sp 
WRAT M 

.35 
.06 
131 


WRAT R 

Lee-Clark 
MAT R 
MAT N 
.30 
.05 
129 \ 


WRAT R 
WRAT Sp 
WRAT M 

.39 
.11 
127 




\ 








Backgroun 


d Variables 


Added to B 


est tests 


« 








Vari-* 
ables 

Added 

• 

• Adj.R* 
Inc.R 
n 


Sex 
Income 

Receives Aid 

1st Lang. 

Bthnicity 

Mora Occ. 

Mom Ed. 

.45 

.04 

199 


Mora Ed. 
Receives Aid 
Income 

A 8 e ~ 
1st Lang . 
Ethnicity ( 
Sex 

.49 
.03 
189 


Ethnicity 
Income 
Mom Occ. 
Mom Ed. 
Fam. Size 

.62 
.06 • 
197 


Ethnicity 
Mom Ed. 
Income 
Mom Occ. 

-59 
.05 
188 


Sex 
Income 
Fam. Size 
1st Lang. 
Age 

.38 
.06 
168 


1st Lang., 
Sex 

Fam. Size 

Income 

Age 

.39 
.03 
149 


Uthnicity 
Mom Ed. 
1st Lang. 
Mom Occ. 
Age 

T 

.47 
.06 
165 


Ethnicity 
Mom Ed. 
Mom Occ. 
Sex 

1st Lang. 

.45 
.05 
146 


Income 
Age 

Fam. Size 
Mom Occ. 
Mom,Ed. 
Sex" 

.38 
.06 
136 


Income 
Fam. Size 
Mom Occ. 
Age 

Mom Ed, 

.39 
.04 
131 


Income 
Mom Occ. 
Ethnicity ' 
Receives i 
Aid 

,33 
.03 
129 


Mom Occ. 
Sex 

Receives Aid 
Mom Ed. 
Ethnicity 

.41 ^ 

.02 
127 




*** * , 














6 











28 



•19- 



Table 7: Predi cting Tirst- and Second-Grade Reading and Math Scores 
From Kindergarten Tests and Background Variables (Cohort ^ 



Subtest 

\dj .R 2 
n 



Subtest 



>9 

Adj .IT 
Inc.R 2 ' 



'I 

•A 



Variables 



Outcome Tests 



v 



J- 



Inc.R 
n 



MAT 

Reading 
1st Grade 


MAT 
Math 

1st Grade 


MAT 

Reading 
2nd Grade 


MAT 
Mat 

2nd Grade 










K Spring 


Best Su 

K Sffring 


btest 
K opring 




MAT Primer 
Reading 

.44 

437 


MAT Primer 
Numbers 

.49 

424 


MAT Primer 
Reading 

.38 

391 


MAT Primer 
Math 

* .45 

377 




Rest Set of 


Subtest^ 




MAT Primer 

Reading 

Math 

.50 

.06 

424 


MAT Primer 

Math 

Reading 

.52 

.03 

424 


MAT Primer 

KcdUlTlg 

*Math 
.41 
.03 
381 


MAT Primer- 

Math 

Reading 
.49 
.04 

377 


Backe 


rround. yariables 


Adder! to Best T 


ests 


Receives Aid 
■ Family Size 
Sex 

Mom Ed. 

1st Language 

.si S ' 

.'03; 
424 


Mom £d. 
Ethnicity 4 

s 

.53 
.01 
424 


Mom Ed. 
Received Aid 
Family Size 
Income 
Ethnicity 
Mom Occ. 
Sex 

. .45 

.04 • " 

381 

f u ' 


Income 
Mom ,Ed . 
Sex- ' 

t 

.50 ' 
. \ .01 

377 



'2.9 



-20- 



Tables 8 and 9, and graphed in Figures 2, 3, 4, and 5.* Table^8 and Figures 
2 and 3 display-two patterns: the comparative predictive power of the three 
procedures and the effects' of predicting outcomes at later and later times. 
Thus medians for this table and these figures are calculated for each selection 
procedure and for each outcome time. These medians combine predictor test 
time (prekindergarten and ^kindergarten) and type of outcome measure (reading 
and math) . 

The patterns in Table 8 and Figures 2 and 3 ai*e similar for the two 
cohorts. In botfi cases, background variables alone have some grefd-ictive 

power but not a great deal. Using just one test or subtest results in higher 

2 .2 
R s. Adding further tests, and subtests increases the R s,still more. And 

combinations of tests and background variables always dp the best. Fn 

2 

addition .we see a consistent degline in R s as the time between prediction 

and outcome increases. I 

The relationship of predictive power to- the time between measurement 

points is morfe thoroughly explored in Table 9 and Figures 4 and 5. To da 

this, medians were calculated for each prediction time and for each outcome 

time. These medians combine the three sets of predictor variables and the 

2 

two outcome measures. In both cohorts we see that R s- are highest when 
prediction o£ first-grade scores takes place in sprihg of kindergarten -- 
the shortest time span between prediction and outcome and declining as 
the time between prediction and outcomes grows longer. ' This phenomenon 



* To further clarify what fte are doing, we will reproduce one calculation from 
Table 8. The median R 2 (.340 in the upper left'-hand corner was obtained as 
follows. The median was taken for R2 S of all first-grade outcome (reading and 
math) and for all prekindergarten and kindergarten single-test predictions 
for cohort III. Thus a median is obtained for the.R 2 s .15, .25, ,.25/. ,35 
(from Table 4), and .33, .39, .45, and .46 (from Table 6). 



30 



-23?- 



Table 8: Median R s for Three Sets of Predictffr- Variables 



Cohort III 



Cohort IV 4 



Individual 
Test (Both 
Pre K and 
K) 



(Both Pre K 
and K) 



Tests and 
Background v 
Variables 
(Both Pre K 
and K) 



1 Reading 
and Math 


2 Reading 
and Math 


" * 

3 Reading 
and Math 


1 Reading 
and Math 


2 Reading 
and Math 


.34 


.21 


.20 


.40 


i 

.36 


.40 


.28 


,24 

i 


.47 


-42 


.45 


.34 


r 

,28 


.49 


.48 

1 J 



Is 



31 



-22- 



Table 9: Median R s for Three Prediction Times 



Cohort III 



Cohort IV 



Pre K Fall 
3 Sets 
Combined 

Pre K Spring 
3 Sets 
Combined 

K Fall 
3 .Sets 
Combined 

• K Spring 
3 Sets 
Combined 



- 1 Reading 
and Math 


2 Reading 
and Math 


3 Reading 
and Math 


1 Reading 
' and Math 4 


2 Realding 
and Math 

i , 


.22 


Al4 


- .14 


1 — 

.32 


.34 . . 


.32 


-\ 

.20 


.19 


* .48 


.44 


- .45 


. .35 . 


.31 






.48 


.38 


\ .37 


.51 


.45 



12 



ERIC 




Figure 2:'' Median R 2 s for Three Sets of Predictor Variables (Cohort III) 



.AG - 



•24- 



~3l 



Median 
• R 2 



.50 
.40 
.30 

* 

.20 
.10 



T gSts and Background Vars. 
>et of Test s 



Median R for Background. Variables Only . 



i 



•. 1st 
grade 



2nd 
grade 



3rd 
grade 



Reading and Math Measurement Time 



J 



figure 3: Median R s for Three Sets of Predictor Variables (Cohort IV) 



ERIC 



34 



10 



Median R for Background Variables Only 



Prekinder 
• Fall ' 

Prediction Time. 



Prekinder 
Spring 



Kinder 
Fall' 



Kinder 
Spring 



J 



Figure 4: Median R 2 s for Reading and Math Outcome Mea^uYes and All 
. Predictor Variables* (Cohort III) 



* Outcome measurements took place in the spring of first, • second , and third grade. 



ERLC 



35"* - 



t 

* 4 


-26- ' 


» > 


Or 7 


















J 


* 








• 


Outcomes . 












Median .40 
R 2 • 

.30 


v • 


Outcomes 




.20 








.10 


2 

Median R*" -for Background Variables Cr.ly 




V 

4 

* 









Prekinder" * 
•jFall 



Prekinder 
Spring 



Kinder 
Spring 



Prediction Time 



Figure 5: 



Median R s for Reading and Math Outcome Measures and All 
Predictor Variables* (Cohort IV) 



* Outcome measurements tookf place in the spring of first, second, and third grade 



36 



^-27- 



is apparent in three ways. First, R s.for second- and third-grade outcomes 

2 

are generally lower than R s for first-grade outcomes. Second, prediction 
improves as the test time is moved from fall to spring for both prekindergarten 
and kindergarten. Third, prekindergarten predictions do not do as well as 
kindergarten predictions. 

Regarding this last point, the higher predictive validity of kindergarten 
tests over prekindergarten tests probably cannot be explained just as a factor 

o 

of different durations between prediction and outcome. The results here are 

consistent with the view that^kindergarten tests are more reliable and that 

<, ***** 

kindergarteners are developmentally better prepared to take tests. Ke 
definitely see the -"better test" effect in cohort III (Eigure 4). The pre- 

^kindergarten tests used in cohort III (the PSI and the NYU booklets) do not 

\ 

do much better than background variables at predicting later outcomes. When / 
2 

we 'examine the R s of the kindergarten tests in cohort III (the WRAT and tjie 
MAT), we see \ substantial increase in predictive power over* the prekinder- 
garten testS. The nonlinear "jumps" in the data graphed in Fjigure 4 suggest 

•v *- * I a 

that "better tests" as.-well as the passing of time may contribute to in- 
greased R s. 

° * 2 « 

We do not see the, same substantial increases in R s of cohortJV (Figure 
5) . ' The R 2 s for the spring prekindergarten tests afre nearly af^laxjge as 
those for the kindergart eiu-tests . These-results may be due in part to the 
u^3 o^[ better prekindergarten tests-in cohort IV than in cohort III. Spe- 1 
cifically, the NYU booklets* are replaced in cohort IV by the WRAT. Thus, » 



* According to Walked, Bane, and Bryk (1973), these booklets "are shortened 
versions of six Early Childhood Inventories which are being developed. . .at 
the New York Uniyersity School of Education" (p. 271). These authors make 
• the following evaluation of the NYU tests* "Neither' Booklet 3D nor 4A is 
an adequate achievement estimate alone since they bo.th have low internal 
reliability and the 3D has definite floor and ceiling effects" (p* 299). 



9 

ERIC 



37 



-28- 



i£ we compare Figures 4 and 5, we see some indication that the large in- 
creases in R 2 s for cohort III might have been reduced if the "better" tests, 
used in cohort IV, had been used also in cohort III, 
'Additional Datajan Early Childhood Prediction 

Shipman, McKee, and Bridgeman (1976), in their study of stability and 
change in disadvantaged Children's family, variables, report findings that , 
parallel 'some of what -we found in our re-analysis »f the Wv/FT data. In part 
of the ETS Head Start Longitudinal Study, these authors examined how- well . . 
measures of family status, mothers' direct and indirect » influence on children, . 
and one prekindergarten test predicted third-grade reading and math achieve- 
ment test scores. \ 

Shipman et al. measured background variables such as number of posses- 
sions in the home a*d mother's education, together with direct and indirect 
process variables such as whether the mother reads to her child. The authors 
also tested children two years before first grade with the PSI and again in 
third grade with the Cooperative Primary Test. Thus Shipman's study resembl.es 
our reanalysis of the HSPV/FT data in several respects. Both use data from 
Head Start children. Both have a measurement time period from prekindergarten 
to third grade. Both use background variables and a preschool test to predict. . 

third-grade outcomes. - * ' ^ 

. ' Table 10 shows the relevant results from the Shipman et al. report. 
Overall their' findings are similar to ours. Background variables account 
for some of the variation in third-grade scores, and a single test-adds an 
,' appreciable amount to the R 2 s. In some respects the' results of the two 
studies differ', however. Shipman et al. included family process .variables 
as well as status variables, but the process variables added little to the ■ 



o 38 



-29- 



J 



Table 10: Predicting Third-Grade Reading and Math' Froip 
A Wide Range of Early Childhood Variables 



Additional 
Variables 



Cooperative Primary Test 



Reading 
Third Grade 

R 2 



Mith 
Third 3rade 



Status/Situational 
£ Possessions 
Crowding Inde* 

Head of Hoiieehold Occupation 
Race 



Mother's Education 



.17 
.24 
.31 
.35 
.37 



.21 
.30 
.38 
.39 



Mother/Child Interactions 
R^eads to Children 
Rational Punishment 
Responds to Child's Questions 
Physical vs. Verbal Punishment 
Expectation 



.38 
.39 
.39 
.39 
.39 



.59 
.-10 
.-10 
.-10 
M 



eiWi 



Mother's Beftavior 

Reads Magazines ' 
Votes 

No. of Groups a Member of 



.39 
.39 
.39 



.41 
.41 
.41 



Preschool Inventory (PSI} 



.46 



.48 



Adapted from SHipman # McKee, and Bridgeman, 1976, pp. 150-155. 



ERIC 



> 



7 



.* 33 



- -30- - ' 

v J 

accuracy of prediction of later scores. The main differenc e bet ween their 

2 

data and ours is their finding of substantial R s when using simple background 

variables to predict third-grade test scores. Their ITs are more than twice 

the analogous HSPV/FT results. Part of this could be due to differences in the 

sets of variables. The Shipman study inclu^ecji information on the home environ-/ 

ment, such -as number of possessions and a crowding index, which was not- avail - 

able from the HSPV/FT data. These additional variables may have added to the 
/ 

power of their predictor variables. Moreover, the sample of children that - 
Shipman and her co-authors studied differed in several ways from the HSPV/FT 
sample, and there is some indication that it was more varied in terms of 
background variables than the latter. For example* mother's education 
averaged about 11 years with a standard deviation of abDut three years for 
the last year of data that the Shipman group analysed. For both cohorts 
of the HSPV/FT data, .mother's education averaged about 10.5 yrears with 
approximate standard deviations of two years. That HSF7/FT children are 
more homogeneous in their background variables reflects the fact that they 
. were in part selected on the basis of economic criteria. It is well under - 
stood th*t the resulting restrictions in range reduce the predictive power 
of the variable Thus the amount of variance accounted for by background 
variables in the HSPV/FT data may be relatively small because the ranee of 



some of these variables -has been restricted. Because their background 
variables, have wider range, Shipman, McGee, and Bridgenan's data better 
estimate the predictive power of such variables for a somewhat- more 
diversified Head Start population. f 



40 



9 

ERIC 



- T 1 

Measuring Misclassif ication to Assess EGT- 1 ^Prediction Strategies 

2 2 
The use of R s and increments to R is one way to evaluate the predictive 

validity of ECT-I variables. If one set of variables results in a "'higher 
2 

R than another it may make sense to include those variables in an ECT-I 

■ ' J 2 

selection strategy. But R s provide only one measure of prediction effec- 

* 

tiveness. Since ECT-I selection involves identifying the most educational 
disadvantaged children to receive Title I services, an alternative assessment 
of potential selection variables is to examine how w£ll individual variables 
and sets of variables classify children. > In this section, then, we will 
illustrate how analysis of misclassif ication rates can be used to evaluate 
potential selection variables. 

The identification of educationally disadvantaged children to receive 
Title I services may be viewed as a problem of categorical classification. 
Based on test results, teacher judgmen£, or, other information, ^school systems 
try to identify children who are educationally disadvantaged from those who 
are not. At the early childhopd level, especially before children enter first 
grade, educational disadvantage is often hard to define. If this identification 
process is viewed in a predictiv'e manner, the goal is to identify children 
who will 'be educationally disadvantaged after" they enter school, so that they 
can receive the benefit of ECT-I services. 

^There are four possible results from such an attempt to identify 'future 
educationally, disadvantaged children. First, there are two ways in which 
prediction can be consistent\with subsequent performance: a child predicted 

A . 

to experience future disadvantage actually does show' it in future performance 
(a "true positive" identification of disadvantage), or -a child predicted, 
not to show later 'disadvantage does not in faqt show it in later school 



ERJC . 



41 



-32- 



J 

performance (a !! true negative" identification). Second, there are two errors 
or misclassifications in such an identification process: a child predicted 
to show disadvantage in later performance does not in fact show it (a, "false 
positive 11 identification), or a child not predicted to show later disadvantage 
does (a "false negative" identification). From this perspective, one way to 
assess the ECT-I selection process is by examining the misclassif ication 
'associated with different selection information. % 

We were abJLe to estimate rates of these two misclassifications for several 
combinations of variables using^the HSPY/FT data. We illustrate this approach 
using third-grade reading scores as a criterion of later performance. As a 
rough indicator of later educational disadvantage, we may define children 
scoring at or below the 25th percentile of the national norms for MAT Total 
Reading as being educationally disadvantaged in reading.* Table 11 shows 
the four possible results from using this criterion. By third' grade" about 
35 percent of t)ie children in our HSPV/FT sample scored at or below the 25th 
percentile. 

We, used predicted scores to forecasts which children would score at or 
below the 25th percentile in the third grade. These scores were calculated* 
from the regression equations from our previous analysis. Thus we were able, 
to calculate predicted .scores using only background variables, using one 
prekindergarten test, using one kindergarten test, and using a combination 
of background variables and tests. We would then predict that children 
would show later educational disadvantage if their predicted third-grade 
MAT reading score fell at or below the 25th percejvt^l^. - . 



ERIC 



* Although this criterion is not hard and fast, it has some precedent. For 
instance, Becker (1977) used the 25th percentile on the MAT to estimate entry- 
level -performance of Follow Through students (pp. 526-528). 

4P 



1 



-33- 



Table 11: Four Possible Results from Comparing Predicted and Actual Performance 



Predicted Performance 



Predicted' Score 
Above* 25th Percentile 
(No Disadvantage 
Predicted) 



Predicted Score at or 
Below 25th Percentile 
.(Disadvantage 
Predicted) 



Actual Performance 



Children Score 

Above 25th ^_ 

Percentile 

(No Disadvantage 

Develops) 



Children Score 
Below 25th 
Percentile 
' (Disadvantage 
Develops) 



True Negative 


» 

False Positive 


False Negative 


True Positive 

* - - 



IERJC 



-34- 



By combining information on predicted scores with information on who 
.actually fell below our, criterion score (the 25th percentile), we were able ' 
-to evaluate several prediction strategies in terms of two misclassification 
rates, which correspond to the upper right comer and lower left corner of 
Table 11. ' The other cells ^in Table 11 represent correct or consistent pre- 
dictions . * 

When we first carried *out this analysis, we used the 25 v th percentile 
score as our criterion. We found that this approach resulted in many^more . 
false negative errors than t false positives. We therefore decided to tryjthe 
34th and 40th percentile scores a-s prediction criteria while keeping the 
performance criterion at the 25th percentile, 

Table 12 shows the results of the analysis for several prediction 

strategies. Each row presents results for a different strategy — using 

< & *. * » 

* * 

only one prekindergarten test in the fall, using a prekindergarten test and 

background^variables in the fall, etc: The three sets 'of three columns present 

the results obtained when the 25th, 34th, and 40th percentiles were used as 

2 

the criterion. The last column shows the R from the regression analysis 
for each' prediction strategy. _ # 

Strategies can be compared in three ways: by examining the rate of 

\ • ■ / ■ ' ' J • 

error 1 (false positives) ,* the' rate of error 2 (false negatives) and % the 

uncertainty coefficient.* The last indicates ."the proportion by which 



* There afS" other statistics for measuring misclassification .rates . Subkoviak * 
(1980), for example, in a discussion of the reliability of mastery classification 
decisions, recommends Colon's kappa when scores from two forms of a criterion- 
referenced test are ayailable'. This coefficient measures the reliability of ( 
the two forms in classifying children as either "masters" or "nonmasters" of 
the items tested. Another, approach is asymmetric lambda, which "measures the 
percentage of improvement in our ability to predict the value .of tjte dependent 
variable once we know the Value of the independent variable" (Ni-e et al., 
1975, ]>. 225). Of course, results will differ somewhat depending on the 
statistic used. • , ■ . 



0 



Tabic 12: Misclassification Rates for Strategics Using Three Cutoff 
Scores to Predict Third-Grade Reading Scores 



— ■ — - - a— 


25th Percentile 


34th Percentile-*-*' 


4l()th Percentile 






error i 


error z 


Uncert. 
Loe r r . 


error i 


Error 2 


. Uncert. 
UDerr . 


* 

Error 1 


+ 

Error 2 


Uncert. 
Loerr . 


D 2 
R 


Background Variables 


7.1 


27.7 '« 


.072 


20.6 


14.2 ' 


.071 


29.8- 


5.7 


.094 


.15 


Best- Test PK Fall 


7.5 


30.8 


.037 


23.4 


" 18. T 


.0187 


34.6 


7.5 


.037 


. 14 


Best Test PK Spring 


8.0 


31.0 


.033 


22.1 


13. "3 


.0675 


31.0 . 


5.3 


.091 


.19 


Best Test K Fall 


^13.3 


19.9 


.077 


26.5 


7.8 . 


.097 


32.5 


4.8 


1 .090 


.22 


Best Test K Spring 


6.5 


23.2 


.114 


12.3 


14.-8 


..148 


25.8 


7.7 


.107 


.29 


7 

BT 5 BV PK Fall 


' 7.1 


31.0 


.068 


19.7 


,7.9 


.154 


22.4 


5.8 


.165 


.14 * 


BT § BV PK Spring 


10.1 


29.2 


.045 


18.3 


11.3 


.127 


25'. 2 


4.3 


.159 


.19 


BT § BV K Fall , 


9/6 


23.0 


.089 


*15.5 


17.9- 


.081 ' 


25.0 


8.3 


.078 


.32 


BT § BV K Spring 


7.1. 


' 25.4 


-.092 


19.1 


14.6 


.076 


22.5 


6.7 


.137 


.35 


All Variables Fall 
1 


6.6 


18.4 


.209 


20.0 


9.6 


.134 


22.2 


5.2 


.188 


.36 


All Variables Spring 


8.7 


21.7 


.122 


15.9 


14.3 


.116 


24.6 


6.3^ 


.138 


.38 



I 
I 



9 

ERIC 



Error 1 = False positive error 
» 

Error 2 = False negative error 

45 



46 



-36- 

'uncertainty' in the dependent variable [here, whether or not the child 

scored below our ciit-off score] is reduced by knowledge of the independent 

variable' [whether or not a low score for that child was predicted] 11 (Nie, 

Hull, Jenkins; Steinbrenner, § Bent, 1975, p. 226). The uncertainty 

coefficient ranges from 0.0/ which indicates no improvement with knowledge *^ 

of the independent variable, to 1.0, which indicates complete elimination 

of uncertainty about the dependent variable given knowledge' of the independent 4 - 

variable, to ^.0, which indicates complete elimination of uncertainty- about - , ^ 

■ ** . « 

the dependent variable given knowledge of the independent variable. 

«** * 

In most respects, the results of the error rate analysis tpar&tl el the 
findings from the regression analysis. We see a fairly consistent improve- 
ment in the strategies from the 'use of only background variable?.* 1^e again 
see prediction improving as the prediction time is moved cl^s^* to the time o 
when outcomes are measured. Overall, howevfer, the prediction results viewed 



in terms of error rates and uncertainty coefficients seem less impressive 

2 ' 
t than the R s. For example, use of a test. and background variables from the 

spring t of kindergarten to predict third-grade reading scores results, in a 

combined misclassif ication rate of 32% and a reduction in uncertainty of 

only 9%, whereas the R is 35%. t 1 . . 

It is important ^p"note that in using an error rate analysis to assess 

the predictive validity of early childhood variables _ the rates of misclas-' 

sification are influenced by choosing different criterion scores for pr&- ' * 

diction. As shown iln Table 12, using the 25th percentile produced many* more 

false negatives than false positives, the 34th percentile resulted ir\ roughly 



the same percentages of both errors, and the 40th percentile produced more 



false positives., . ^ 



ERIC 



••• ' . t 

• ■ ' ' 47 - • * 



» - , 



-37- 



More generally, note that when deciding on -variables to include in a 
V 

selection strategy, one needs to consider the economic and social costs 
associated with these different errors. If one believes the costs are 

. j ' . ... ~W 

similar, *a criterion score that equalized error rates is indicated. But 
if one thinks that missing a child who needs help/is worse than helping one 
that does not, a score that minimizes false negative errors is preferable. 
If, however, erroneous prediction of disadvantage is seen as worse, more 
weight should be given to reducing false positive misclassification* 
Summary of the Findings from the HSPV/FT Data 

We have learned and confirmed severai things that bear on the discussion 
of predicting later educational outcomes from measures of early childhood 
Variables:- * 

• <_ - 

• Although the predictive power of background variables in the HSPV/FT 
data wa£ modest ~at best, such variables seem to have a place, in 
selectihg children for ECT-I programs. One reason for including 
background variables in a "selection strategy is that their pre- 
dictive power cioes not seem to decrease over time. In addition, 

1 t we have evidence from the Shipman et al . study that background * 
variables may have greater predictive power for papulations .that 
are more diverse than the HSPV/FT groups. 

• In some cases, one test'' ox subtest 'does fairiy°well in predicting 
later outcomes', especially jfhen prediction occurs during kindergarten. 

'^Moreover, some tests do much abetter than others. The KRAT and the 
shorter version Q^the PSI did best in the HSPV/FT data set.- 

• • Time between test points influences the predictive power of early 
„ 1 * childhood tesgs. The longer the time, the less accurate the pre- 
diction. 

• In addition to tSe' influence on time, there is some evidence 
suggesting that the poor quality of prekindergarten tests and * 

i the 'difficulty of testing very young children reduce the predictive 

power of tests given during prekindergarten. This, in turn, may 
argue for relying^more on other indicators such as background 
variabiles or teacher judgment for selecting children for prfe-K-ECT-I 
programs . * 



9 

ERLC 



.7 



-38- 



• The misclassification analysis also illuminated the importance of . 
two different errors any prediction^ strategy can make — the false 
positive classification and the false negative classification. 
The relative costs of these errors should be considered when assess- 
ing variables for an ECT-I selection strategy, 

META-ANALYSIS OF PREDICTIVE STUDIES * 1 

Meta-analysis, a term coined by Gene Glass (1976, 1977), is a strategy 
for quantitatively combining the results of similar studies. It involves 
examining as many published and unpublished studies as possible in the area 

i * 

of interest. The analysis then proceeds by determining summary statistics* 

y . 
(such as effect sizes' or correlation coeff icifcnts) from each study, aggregating 

\ 

thes^e statistics, and obtaining a distribution of study, statistics for which 
a mean, median, standard deviation, and* other descriptors' are calculated. 
Meta-analysis rests on the assumption that each study in aij area of inquiry 
is analogous to sampling from a population of interest and estimating the 
population parameter. Thus, Glass argues, averaging study results produces ■ 1 
an accurate estimate of the parameter in question.^ Glass acknowledges that 
some spudies are better than others and should be weighted more heavily.. 
To determine whether studies should be weighted according to study .character- 
istics, he advises computing correlations betweeji characteristics oft^^erest* - 
and the magnitude of correlations. If correlations are substantial, ^these 

characteristics should be taken into account in combining studies. If there 

* 

is litrtle relationship, Glass maintains that the characteristics can be 1 

a 

discounted. ^ . 

Meta-analysis is a plausible approach for looking at the^ predictive 

* / 

validity of individual variables that might^e used in selecting ECT-I 
-^children. As we stated earlier, no comprehensive study Has been done on 

- • 49 . 



the predictive validity of ECT-I selection strategies, but^ there are hundreds 

of studies that contain bits of relevant information. For example, scores 

of' studies over half a century have examined the predictive validity of 

readiness tests. Meta-analysis can be "used to as.sess the overall predictive 

power of such tests. Likewise, many authors and practitioners ars/ue^that 

(teacher judgment is as good a selection mechanisfli^as readiness scores. 

Meta-analysis can be used to combine studies of teacher j ud gmfent^aand the 

results can then be compared to the predictive validity of readirass tests. 

r 

Scope of the Analysis t 

While planning this meta-analysis, we decided to focus on reading, 
math! and language arts putcomes , ^(as measured by both standardized test 
scores and^school^ grades)^ since these are the primary areas of interest 
in early' elementary Title. I programs/ Itfe next ir^ade a list of possible 
predictor variable! such as sex, race, test scores, teacher judgment, 
measures of socio-economic ^stiatu^ (SES)~, arid, fairily variables, (For the 

X 

initial list of predictors and outcomes, see Table *15 , ) We decided to look , 
at,, studies that examined relationships (usually simple correlations) between 
one or more of\ these outcomes and one or more predictors. 

The list of predictor and Outcome variables needs some further explana- 
tion. As one can readily see, we included a wide variety of predictor and 
outcome variables in our initial list- 4 Latter we found it necessary to 
eliminate some predictors and outcome? because too few studies containing 
those variables could be located- A few of the variable labels in Table 13 
require some description. Items in^ the home refer to -family possessions 
such as vacuum .cleaners and television sets, which are often used as indicators 
of social class- Other SE^ measures include scales used to assess social. 



-40- 



Table 13: Predictor and Outcome Variables Sought in 
Studies Assessed by the Meta-Analysis 

*- 

Predictor Variables 



Sex 
Age 
Race 

Income ^ 
Father's Education 
Mother's Education 
Father's Occupation 
Mother's Occupation 
Items in the Home 
Other SES Measures 
Sibling Variables 
Family Variables 
Teacher Judgement: 



* Outcome Variables 
Reading Achievement 



Reading Readiness: 

r ■ ; 

Other Readiness: 



IQ . Tests; 



Other .'Tests: 



PK II 
' PK I 
K 
1 

PK II 
"PK I 
K 
1 

PK II 
PK I 

K 

1 

PK II 
PK I 

K 

1 

PK II 

PK I 
* K 

. -1 



Parents' Desires 

Prior School Experience 



'.s 



V 



Math Achievement 



Language Arts Ach. 



IQ Test 



Composite Achievement 

V ' 

Reading Grades 
\ 

Composite Grades 



Other Measures, 



1* 

2 

3 

4 

5 

6 

1 

2* 

3 
4 
5' 
6 
1 
2 
3 
4 
5 
6 
1 
2 
3 
4 
5 
6 
1 
2 
3 
4 
5 
6 
1 
2 
3 
4 
5 
6 
1 
2 
3 
4 
5' 
: 6 
1 
. 2 
3 

4^ 
5~~ 
6 



c 



9 

ERIC 



* Arabic numerals are grade levels: 1 = first grade, 2 - second grade, etc. \^ 




•41- 



class* Sibling variables refer to such things as number of brot 
sisters, birth order, and siblings 1 eligibility for compensatory education 
Family variables incite measures such as assessment of parent-child inter- 
action. 

Teacher judgment* and all early childhood tests were grouped according 
to when assessment took place. PK II refers to a test t^me two years prior 
to kindergarten; PK I, to one -year prior. First grade (1) refers to fall 
of first grade for teacher judgment and early childhood tests. The other 
readiness tests include composite readiness scores and subtest scores other 
than reading readiness subtests*. Other tests include socio-emotional and 
psycho-pfrceptual tests such as the Bender and the Wepman. If we we^e 
unsure where a test fit, we consulted Buros (1972), and followed his 

• ■ * i 

categorization. . j 

Initially, we categorized study outcomes according to achievement, 
test scores, IQ test scores, school grades, and other measures, and sought 
studies that reported these outcomes measures fo^the first to the sixth.' 
grade. We categorized a first-grade measure as an outcome only if ,it was 
obtained in the spring of first grade. For other grades, we -made no 
'distinction between fall and spring. n 
Locating Studies 

✓ We used several methods to 'find studies. We made an ERIC search of 



* Teacher judgment was assessed in a variety of ways, from 5-point Scales 
to elaborate questionnaires. - 



erlc 



> 



-42- 



jourattls and ERIC documents.* We consulted literature reviews (e.g., Bryant, 
Claser, Hansen § Kirsch, 1974) and* other meta-analyses (for example, White, 
1976); and searched through dissertation abstracts and reviewed the indices 
of relevant journals for the last ten years.** Finally, we examined the 
bibliographies of articles, books, and reports that we reviewed. In all, 

approximately 300 4 studies were read. These are listed in part II of the 

* »» 

bibliography. 

Criteria for Including Studies in the Analysis 

To be included, a study had to report at least one measureof a relation- 
ship between an early childhood predictor and a later outcome. Most of the 

studies we included reported simple correlations. Others reported statistics 

i 

that could be converted into correlation coefficients. (Sep Glass, 1977, for 
details on converting various statistics to Pearson r ! s.) Studies that re- 
ported only multiple regression analyses without correlation matrices were 
excluded, since simple r ! s could, no.t be retrieved. Because children develop 
rapidly during early childhood, we discarded any study that did not repoft at 
least approximate indications of children's ages for the times when predictor 



and outcome variables were measured. Some articles reported ages in months, 
others in years and fractions of years; and sti^l others reported the grades apt 
seasons when tests were given. To standardize ^ur coding of ages, we decided ^ 
record grades and seasons when measurements were made. Table 14 shows how 



we converted ages into grades and seasons. Finally, we omitted - any study 



\ 



* We first selected all studies with the keywords Early Childhood, Then from 
all early childhood studies selected those with the keywords Predictive ' 
Validity, Siblings, Achievement, Failure-Success Prediction, Reading Readiness 
"SES, or Parent-Child Relations. 

** The journals w* % re American Educational Ittesearch Journal , Child Development , 
Educational and Psychological Measurement , Harvard Educational Review , Journal 
> Of Educational Psychology , Journal of Learning Disabilities , School Review , f 
q and Teachers College Record . 

ERIC , . 53 



-43- 



Table 14: 
1 



L 

Children's Ages and Corresponding Grades and Seasons of the Year 



Age (years) ^ 



Grade and Season 



2.5 to 3.0 
3.1 to 3.5 

3.6 to 4.0" 
4.1 to 4.5 
4.6 to 5.0 
5.1 to 5.5 
5.6 to* 6.0 
6.1 to 6.6 

7- - . 

8 
9 

10 
11 



Fall Pre-K II _ 
Spring Pre-K II 
Fafl Pre-K I ' 
Spring PrC-K I 
Fall K 
Spring K 

Fall First Grade (1) 
Spring Fiyst Grade (1) 
Second Grade (2) 
Third G^ade (3) 
Fourth Grade (4) 
Fifth Grade (5) 
Sixth Grade (6) 



f 



• 



4 



/ERIC 



54 



\ 



-44- 



• • • 

that did not report sample size. Based op these ^criteria of acceptability, 
119 of t}ie 300 stucie^initial ly identified were included in the analysis. 
(These are identified wit^jasterisks in part II of the bibliography.) 
Recording Study Characteristics * 

As noted earlier, the magnitude of correlations can vary with study 
characteristics. (For example, White [1976] found that a somewhat stronger 
relationship between socio-economic class and achievement test scores was 
reported in published than in unpublished studies.) Therefore we decided 
to record a'wide range/ of such characteristics. For our analysis, we attempted 
to. record' the following information: 



Date of tho^ study 

Author's affiliation 

Source of the study 
(e.g,, journal) 

Number of subjects 

Study population 

(local, regional, etc.) 

Percent minority, q ■ 

in sample 

Attrition rate 

t> Whether children received 
some special program 

Independent- variable^ 

v initial measurement 
time 

Means of independent 
variables 



Standai^ deviation of variables 
t 

Reliability of independent variables 

Evidence of truncation in independent 
^variables 

Outcome measures- 

Outcome measurement time . 

Means of outcome measurements 

Standard deviations of measurements 

Reliability cioutcome measures 

Evidence of truncation,, in outcome 
measures 

Correlation between independent 
variable and outcome measure 



Of course, few studies reported all of these characteristics. 



55 



-45- 



Analyzing the Results 

We had three main questions in mind when we examined the results of -the 
meta-analysis : 



— <S * On average, how does each early childhood variable correlate 
with various outcome meaures? 

• Do some variables have significantly higher correlations^, indicating 
that they could be better predictors of later outcomes than other 
early childhood variables? 

• To what extent do the results of the meta-analysis substantiate 
or differ from our findings from the HSPV/FT study? 

Our first step was to calculate the average correlations between predictors 

and* outcomes . * This resulted in a matrix of 40 predictors by 48 outcomes" and 

a total of 1291 correlations. Despite the large number of correlations, many 

cells contain few or no correlations. Some predictors, such as PK II tests, 

had no correlations. Some outcomes, including most from the later elementary 

grades, had as few as one correlation. We decided to eliminate those pre- 

dictors and outcomes that had just a few cases ot no cases at all.* By so 

doing we reduced the meta-analysis so that it would more nearly parallel 

the data available from the HSPV/FT study. We therefore concentrated on 

^p^ading, math, and language arts achievement test outcomes at th^ first, 

♦second, aj)d third grade levels/ This is much like the HSPV/FT .study, which 

has reading and math outcomes in the first three grades. We also pared 

down the number of predictors. Still included -are the 12 background 

variables (sex through family variables in Table 13), teacher judgment, 

and the four types of early childhood tests. Again, this set of predictors 



* Our unit of analysis in calculating ^averages and other statistics was 
the correlation coefficient, not the study; thus some studies contributed 
one correlation to the analysis,, others contributed 20 or 30. 



1 



58 



-46-. 



parallels the HSPtf/FT study; but it also permitted us to look at a broader 
range of preschool tests than was included in the HSPV/FT data, and at teacher 
judgment, which was^not included in the HSPV/FT data. Appendix A contains 
the matrix 4 of average correlations (821 in all) between all predictors and 
the three outcomes reading, math, and language arts achievement test scores.* 
Appendix B contains correlations (624) .between early childhood tests and 
the three outcomes. 

Some questioiTS^about the predictive, validity of ECT-I selection variables 
can be addressed by examining average correlations. Other questions require 
further analysis. When we ask whether some variables do better than others 
, in predicting later outcomes, we are asking an inferential question, for we** 

* • ■ ■ 

want to know whether differences in correlations are just chance variations 
or are statistically and educationally significant. To answer such questipns 
\ • *we,pref er, to, use parametric, statistics, which assume normally distributed 
' variables. Correlation coefficients, however, are not normally distributed. 
(See^Fisher, 1915 ; McNemar, 1969 ; and ' Cohen Cohen, 1975). Fortunately, 
Fisher, devised a method for transforming correlation into Fisher's z> 
which approximate normal distributions (Cohen S Cohen, 1975, pp. 50-52). 

"Transforming correlations inta z ! s makes it sensible to use analysis^ 
of variance (ANOVA) and other .parametric procedures. The next subsection 
illustrates how we used ANOVA to compare the average correlations of dify — ^ 
f erent 'eaTfy childhood tests, different outcome measures, and different 
measurement points. 



* These correlations are based on a combined saifTple of size of 147,780. 
Studies on average had -an n~of 180. 



5M-< 



ERIC 



-47- 



The Findings of the Meta-Analysis 

t .Discussing the .findings of^the meta-analysis is not easy. We are 
examining average correlations from a large number of studies, which vary 
in several ways. They iraiy in sample size*, attrition rates, and other 
study characteristics. They vary regarding the predictor variables * (X'sX 
employed. They use different outcome measures (Y's). Prediction times (t^) 
also vary, as do outcome times (t ) . In principle, we could ^analyze simul-- 
taneously all the. ways in which studies Vary. But such an analysis (e.£. 
a four 7 way ANOVA with several covariates) would fre complex and difficult 
to interpret' if there were 1 significant interactions. Instead, we have 
decided- to examine smaller pieces of the puzzle. Ke leave to the last a 
look at the influence of study characteristics such as sample size, turning 
first* to differences resulting from varying predictors, outcome measures, 
predictor times, and outcome times. At most we will disGuss two ~6f these 
four dimensions at one tirte. To do this we. will have to "aver/ge across" 
the other two dimensions. For example, when ve' examine the effects of A 




different outcome .measures (reading, matTi, and language^pts tests) and 



different outcome times (first grad/, second grade, and third grade), we 

average across different predictors (such aS types of early childhood tests) 

and predictor -times (pre-K*I, K, and first grade). 

In examining variations in X's, Y's, t 's and t f s, we concentrated - 

x y 

on five areas: 

j <~ 

• Background variables as predictors 

• Differences in predicting reading, raath, and language arts outcomes 

• Effect of time betweeil prediction and outcome An predictive validity 

• Predictive validity of different categories of early childhood tests 

• Predictive validity of teachers' judgment. 



ERLC 



Do 



-48- 



Ke will discuss each of these in turn. 

Background Variables . Tables 15 and 16 summarize the results for a ^ 
set of background variables. We are interested in whether the predictivp ( 
validity o£: background variables varies over outcome time (t ) and with 
different outcome measures CY's). Thus we. are looking for effects in these 
two dimensions, and we are averaging across the different back^rourld variables- 
(X's). The number in the center of each cell in Table 15 is an average cor- 
relation* computed across 12 background variables.** This table and the * 
accompanying two-way analysis of variance show two things about the predictive 
validity of background variables. First, we see no difference in the overall^ 
ability of background variables to predict different outcome tests: the mean 
correlations b^tVeen background variables and reading tests (.20), math 
■ tests (.1-7), and language arts tests (.18) are not significantly different 
.(F - 1.65, df - 2, 91, p >.05). '" Second, we find no decrease in the predictive* 
Ijower of background variable^ as they are used to predict later and later 
outcomes (first through ttiird grade) . The average correlation between back- 
ground. variables and first-grade outcomes is actually the lowest, but the • 
differences among means are not significant (F = 2.41, df - 2, 91, p > .05). 
These two results parallel what are found in the HSPV/FT data. 

Table 16 presents a further comparison between the meta-analysis and the * 
HSPV/FT results. The predictive power of r the background variables represented 
in both data sets is similar. Tablfc 16 shows that, with a few exceptions, 



* All average correlations are weighted by the number of cases (i.e., number 
of correlations) per cell. 

** Sex, age, race, income/ father's education, mother's education, father's 
occupation, mother's occupation, items in the home, other SES measures, sibling 
variables, and family variables. 



ERIC 



-49- 



Table 15: Average Correlations Between a Set of Background Variables 

— and Reading, Math, and language Arts Scores for Three Outcome. Times* 





• . Outcome Time 


Row 

Averages 


Outcome Tests 


Grade 1 


Grade 2 


Grade 3 




Reading 


.17 

i 


.23 

.13. 


.22 

20 


.20 


Math 


o./ 

1 


.26 - 
5 


. .16 

20 


.17 


Language Arts 


.11 

• 1 


-s21_ 

6 


t .11 

1 


. ,18 


Column Averages 


'.16 


\23 V 

> 


.19' 


t 


*Th£F number in the lower right corner indicates the 
number of correlations jper cell., * 



Analysis of v Variance: 



9 

ERJC 



Fisher's z by Type of Outcome Test 


and Time 


of Outcome Test 

<» 


Source of 
Variation 


Sum of 
Squares 

* — — — *»— 


OF 


.Mean- 
S.quare 


,F 


Significance 
of F ' 


Main Effects 

- Outcome Test 

- Outcome Time 


0.139 
0.070. 
0". 102 


*4 
2 
2 


0". 035 
0.035 
0.051 


1.652 
1.649 . 
2.411 


0.-169 

0.199. 

0.096 


2-Way ^Interactions 


0.049 
0.049 


4 
4 


0.012 
0'. 012 


0.576 
0.576 , 


.0.680 ' ' 
0.680 


Explained 


.0.188 


8 . 


0.023 


1.114 


0.36? 


Residual 


1.751 


83 


0.021 






Total 


1.939 


91 


0.021 



















60 



-50- 



Table 16: C omparison of Correlations Between Background Variables and % ; 
Outcomes' from the MeTa-Analysis and HSPV/FT Data Sets (Cohort IH) f 



/ • Outcome Tests ar.d Tir.es 



Background 
Variables 


Reading 
1 


Reading 
2 , 


,Rqading 

1 3 


Math 
1 


Math 
2 


Math 
3 


» • * 
< 


Sex 

Met a 
, HS/FT 


.19 

.06 - 


.17 
.16 


.13 

1 


120 • 


" .00 


.01 
-.01 ■ 

4 




Age 

Met a 4 

HS/FT 


.05 ' 
,15, 


.19 , 
.01 


.01 
-.02 


.01 


• .10"- 


-' .08 


* . J 


Rape 
,Meta 

HS/FT V 


.14 

.,36 


.23 
.21 


.16 

.23 % 


^27 
.15 


.14 

.33 






Inccme ' 
Met a 
HS/FT 


** .21 


.17- 


1 

;o7 

.20 


. 14 


-.16 : 


- • . 12 
.20 • 


* • * 


Mother's 

Educ. 
Met a 
HS/FT 


.09 


■ .19 
.10 ' 


.54 
-.02 


.02 


• 0 


.26 
-.02 


* 


Mother's 
Occ. 

Met a 

<HS/FT* • 


-.06 


.22 
-.07 


. 18 
-.08 


-.07 


f ~ 

-.10 


' -?11 


* 

i 



61 



o 



-Sl- 



comraon correlations from the 'two analyses are similar. Like the HSPV/FT data, 
the meta-analysis indicates that background variables are modestly correlated- 
with achievement outcomes. * * ( . 

Predicting; Different Outcomes .-. Table 17 reports average correlations 
between early childhood tests and reading, math, and language arts outcomes 
at grades 1, 2, and 3. *Thfe number in the center of each cell is an average ^ 
correlation taken across all early childhood tests (reading readiness, other 
readiness, IQ, and other test?) at three prediction times (prekindergarten I,' 
kindergarten, and grade 1).-' All averages are weighted by number of cases 
per cell. 

Table 17 and the companion two-way ANOVA show some unexpected results. 
Looking at the row averages, we s^e that the i^ean correlations .for reading • 
and math putcomes are about the sane but that the language arts mean is 
considerably lower, and the analysis of variance shows a significant difference 
among means (F = 8.18, df = 2, 615, p < .01) ► Instead of these findings, 
"we would expect that 'eajrly childhood tests would differ in haw well they 
predict reading* and math scores, since reading and solving math problems 
presumably involve quite different skills. We would expect smaLLer cif- - 
of'efeiices- in the prediction of reading and language arts score§, since the 
'underlyip^ skills are probably more closely related than reading and math 
skills . 

The analysis of variance also shows that the^ times of ^outcome measure- 
ments<are significantly different (F = 30.07, df - 2, 615, p <.01). However, 
the column meahs indicate a pattern unlike what we would expect. Initially, 
'correlations* go down predictions of first-grade scores do better than 
those of second-grade scores. But , third-grade- predictions are higher than 



?62 





Table 17: Tests as Predictors of Reading, Math, and 
! Languag^ Arts Outcomes* ^ 





Outcome Times 




JSP 

s Outcome Tests 


Grade 1 


Grade 2 


Grade 3 


Row 

A A 7 <3 /TAP 

av erases 


Reading 


.44 
193 


. 33 
121 


.38 

43 


.40 


Math 


.47 

27 


.4k 

12 


< .35 

, '19 


f 

.42 

1 


Language Arts 


.39 

86 


.27 " 
88 


.34 

•35 


.33 


« *> 

Column Averages 


.43' 


.31 


. .36 


• 

4 



The number in the lower right^corner is the'number of v 
cases -p^r cell . f 



Analysis of Variance: 
Fisher's z By Type of Outcome Test and Time of Outcome Test 



Source of 
Variation 


Sum of 
Squares 


. DF 


Mean 
Square 


F ' 


Significance 
of F 


Main Effects 


3. 


550 


4 


0 


888 


21 


.150 


• 0 


000 


- Outcome Tests 


0 


686 


2 


0 


343 


8 


.175 


0 


000 


- Outcome Time 


2. 


523 


2 


1 


262 


30 


066 


0 


000 


2-Way Interactions 


0 


125 


4 


0 


.031 


0 


.746 


0 


.561 


0 


125 


4 


0 


031 


0 


.746 


0 


.561 


Explained 


_ 3 


675 


-S 


0 


.459 


10 


.948 


0 


.000 


Residual 


25 


808 


615 


0 


.042 










Total " • 


29 


483 


623 


0 


.047 











ERJC ' • 63 



/ 

lS3- 



ERIC 



second-grade. This is puzzling, since 'we expect prediction to decline mono- 
tonically. This seems to result' from some complex confounding of study character- 
istics and outcome measurement times. t In looking fox a precise explanation, we 
explored the effects of. several characteristics. We found, for example, that 
attrition i-s related^ tpjghe magnitude of correlations. 4 Howfever, we were unable 
to fjnd a cpippletely s^^pactory explanation for^thtse results.^ 

Types of Predictor Tests . Table 18 summarizes our findings for the 
.correlations between types Of ^j^fective tests (reading readiness, other , 
readiness, 10/ apd other tests) given at thre4 times (prekindergarten I, 

/ ' ■■: ' ' 4 

kindergarten, and first ,grade) and^the three outcome measures (reading , 

math and language arts). The numbers in the center of each cell are the 

average correlation across the tWree outcome measures (Y's) and the three 
i 

. autfcome times (t y t s) . « 

The unadjusted knd adjusted means in Table 18 and the analyses of 
variance again show some unexpected findings. The unadjusted means and 
the ANOVA ior type of predictor test indicate* that reading readiness tests • 
do slightly better than other readiness and IQ tests. Other tests seem to 
do worst. This finding supports our argument in the previous section • 
that predictive validity should be an important criterion for choosing a 
selection test, since not all tests predict equally well. The results 
regarding the* effects of prediction time on the correlation between early 
childhood tests and outcomes seems to contradict expectations. The first 
analysis of variance (with unadjusted means) shows no significant difference 
among the prediction times (F = 1.29, df * 2, 612, p>.05). Moreover, 
the pattern of the unadjusted means is unexpected: tests given in pre- 
kindergarten appear to-be the best predictors of later outcomes. 

At first we thought that there might be Some confounding between the ' 
time at which the predictor test "was given and the total time between * v * 

64 



-54- 

Tab}e 18: Ayerage 'Correlations Between Types of Predictor Tests and Later- - 
" 3 Grade Outcomes Unadjusted and Adjusted for Time 



Predictor Test : 

Reading Readiness Tests 
Other Readiness Tests 
IQ tests 
Other Tests 

Predictor Time : 

Prekindergarten 
Kindergarten 
First Grade 



Unadj.usted Heans 



.47 C264) 

.,41 C 67) 

.41 C 80) 

■ .29 (213) 



.42 ( 41) 
.37 (348) 
:42 (235) 



Means Adjusted for Time 
Between Measurements 

.47 (264) 

.40 ( 67) 

.40 ( SO) 

.30 (213) 



.52 ( 41) 
.40 (348) 
.38 (235) 



Analysis of Variance: 
Fisher's 2 by Predictor Test and Predictor Time 



9 

ERIC 



\ 



Source of Sum of 

' Variation "Squares 
* 

A 


* DF 


' Mean 
Square 


F 


Significance 
" of F. 


\ 

Main Effects . 5.522 
-Predictor Test 5.010 
-Predictor Time 0.098 


5 

3 
2 


,1 .104 
1.670 
'0.049 


28.989 ■ 
43.834 
1 . 291 


0.000 
0.000 
0.276 


* 


2-Way Interactions 0.646 
Explained 6.167 


— t - 6 

^ 6 * 
11 


0.'l08 
0:108 
0.561 


2.824 
2.824 
14.717 


"0.010 
' 0.010 
0.000' 


.A 


Residual 23.316 


612 


0.038 








Total % 29.483 


623 


0.047 ' 








Analysis of Variance: 


» 






{ Fisher's z by Predictor Test and Predictor Time 

* ^ontrtflling Time Between Measurement ^ 




'Source of 
Variation 


Sum of 
Squares 


DF 


Mean 
Square' 


F 


Significance 
of F 


.* 

{ Covar.iates 

-Total .Time Between 
Measurement Points 


1.069 
1.069 


1 
1 


1.069 
1.069 


29.081 
29.081 


t 

0.000 
0.000 


Main Effects 
-Predictor Test 


5.436 
4.430 


5 
3 


1.087 
1.4^7 


29.580 


0.000 

o.-e&o 


-Predictor Time 


0.717 


2 


0.35§~" 


""""9/752 


0.000 


2-Way Interactions 

Explained 
Residual 


0.519 
0/.519 
7.024 

22.459 


6 
6 
12 
611 


0.087 
0.087 
0.585 
0.057 


2.354 " 
2.354 
15.925 


0.030 
0.030 
0.000 


Total 


2^.483 


023 


/> - 
0.1K3 







-55- 



predictor test and outcome test. If the time between tests for studies- 
using prekindergarten tests tended to be shorter then the time between 
tests when kindergarten and first-grade tests lire used, the average cor- 
relation from the prekindergarten studies might be equal to or larger than 
the averages from kindergarten and ,first-gxade studies. For example, if 

most prekindergarten tests were used to predict first-grade outcomes whereas 

t 

most kindergarten and first-grade tests were used to predict third-grade 
outcomes, the prekindergarten tests might appear to do as "welM as or even 
better than the kindergarten and first-grade tests. To test this hypothesis, 
we entered the total time between tests as a covariate and then performed 
thw two-way ANOVA again. The column of ^Kljusted means in Table 18 shows 
that the average -correlation for prekindergarten tests increases by 
nearly. 25 percent \\hile the other means hardly change when time between 

tests iS" taken into account,. Moreover, the differences among these jjieans 

> 

' become statistically significant (F = 9.75, df = 2, 611, p< 0.001), This is 
a surprising result, quite at odds with our findings from the HSPV/FT data. 

A closer examination of the studies that ai;e represented in Table 18 
helps explain these results. First, most of the correlations fall in the 
kindergarten and ^first-grade rows. One might expect that the prediction, 
times for studies using kindergarten tests and those using 'first-grade 
tests would differ by about one year on average. But many of the kinder- 
garten studies tested in -the spring and all the first-grade studies tested 
in the fall because we classified any test given after fall of first grade 
as an outcome. Thus, in many cases, the prediction times for kindergarten 



and first-grade studies differed by only a few months. This short ^cime 

difference may account for finding no .difference between the m,eans of 
kindergarten and first-grade prediction times. 

ERIC ^66 



The high average value for prekindergarten tests may be caused by an 
abun^^jl of n ^ood ,f tests in prekindergarten studies. The 41 correlations 
of prekindergarten tests with later outcomes come from four studies. These 
Vy studies, upon closer examination, used; prekindergarten tests that we found 
in the HSPV/FT study tosbe gfiod predictors of later reading and mifclv* scores. 
For example,- one study, reported correlations of 0.60, 0.59, and 0\49 
between the WRAT and reading scores at first, secon'd, and third grade. 

if • v 

Tfoese findings are comparable to our results from the HSPV/FT data, which 

fshowed correlations of 0.70 and 0.64 between the WRAT reading subtest and 

first- and second-grade reading scores. Thus we would expect /the average 

correlation for prekindergarten tests to be lower if we had found studies 

with a wider range of prekindergarten tests. 

The final result of note from Table 18 is the statistically significant^ 

* 

interaction between test time and test type. We cannot explain this result.' 
The interact ron seems to be small in comparison to ,the main effects, and 
we suspect that it is not of substantive significance. 
The Predictive Validity of TeacHer Judgment 

Our final results deal wi^hr^teacher * judgment as a predictor of later 
achievement" As the Huron field' study (Yurchak § Bryk, 1980) reported, 
teacher judgment is often used explicitly or implicitly to select ECT-I 
particip"!hts. ' Indeed, it is often suggested as a complement to or sub- 
stitute for test results. We were able to locate studies with a total \of 
75 correlations between some kind of teacher judgment and reading, math, 
and language arts scores in grades 1, 2, and 3. 

Table 19 presents a summary of what we # found. Each cell averages 
across measures '.of teacher* judgment in kindergarten and in first grade. 



\ 



-57- 



Table 19: Teacher Judgment as a Predictor of Reading, Math 

and Language Arts Achievement* \ 



Outcome Time 



Outcome Tests 




Gr£de 2 . 


Grade 5 


Row A v e r a g e s * 


Reading 


.41 

27- 


.56 

4 


.46 
s 6 


.43 

> 


Math 


.33 

11 




.51 

3 


*> 

. .37 


Language Arts 


.37 

25„ 


— *z ' 


.57 

1 


— ■ #» 

.37 ^ 


Column Averages 


.38 


.56 

/ 


.47 





* The number in the lower richt corner of each cell is the number 
- of cases, 

K 




T 

It # 



ERIC 



- 58 - / 

' ' f 

(We found no studies relating teacher judgment during prekindergarten with 
later outcomes.)- Teacher judgment in these studies encompases a range of 
activities. In some cases, teachers were asked to rate children's future 
achievement on a 5-point scale. In other studies, teachers were asked to 
judge children on the same criterJ^ that the tests used. Still others used 
lengthy questionnaires for teachers to aSTess children. Studies also varied 
on how long teachers knew the children they asses|^^^d on whether teachers 
had seeji test results before making their assessment. 

- Table 1^ shows that teachers seem to do well. Average correlations range 
from 0.33 to 0.56, which is not much different from the correlations for the 
best tests in the HSPV/FT data. lVe did not perfrom an 'analysis of variance 

• 6 

on these data, but we can get some useful impressions from the row and column 
means. First, there seems to be little different in predicting the three 
outcomes, although the average reading correlation is higher than the cor- 
relation for math t or language arts. Second, we do npt see a decline in 
-average, correlations as outcome time lengthens. In fact, the. first-grade 
average correlation is*lowest of the three. This indicates that tea^pr 
judgment may behave like background variables and unlike test sqpres; i.e., 
the predictive validity of background variables seems' to be fairly stable 0 
over time whereas the predictive validity of -early childhood tests declines' * 
over time. * / 

We need to note .an important caveat regarding the studies from which our 

. • J r * * * J 

results on teachejr judgment* were obtained. Almost all of the correlations 

(68) measured th4 relationship between teacher judgment in the spring of* 

kindergarten and test scores in grades 1, 2>,and 3. No study reported 

results from the fall of kindergarten, and there were only 7 correlations^ 



ERLC 



'69 



•59- 



th at resulted from f&ll first-grade teacher judgment. Thus one reason teacher 
judgment accurately predict later score may be because the teachers knew the 1 
children they were assessing for nearly a full school year. But teacher 
judgment used as part of an ECT-I selection procedure would most likely take 
place in the fall of the year, when the teachers have known the children they, 
are assessing for only a short time. Thus we must be cautious about being 

' 9 

overly enthusiastic about teacher judgment until more data are available. 
Limitations, and Caveats 

Meta-analysis is a new and somewhat controversial analytic technique; 
therefore, one must be cautious in its use. As meta-analysis had been used 
to synthesize results in more and more areas, critics have raised some im- 
-portant concerns (see, for example, Eysenck, 1978; Gallo, 1978; and replies 
to Rosenthal and Rubin, 1978). This paper is not the place to examine the 
"virtues and vulnerabilities" of meta-analysis in general (see Hauser-Cram, 
Note 1, and Jackson, 1$80), but it is appropriate t^raise some caveats and 
limitations regarding the -results from our application of the technique. 

Our use of meta-analysis to examine potential ECT-l' selection yariabies * 
has been an exploratory* process . We have sought to move beyond what Glass 
and his colleagues have tried. They attempt a single meta-analysis — for 
example, synthesizing the. effects of class size on achievement or the effects 
of psychotherapy. We, on the other hand, have, tr^ed to synthesize finding^ 

\ 

from several areas simultaneously — analyzing studies employing one or more 
prediction variables from several sets of variables background measures, 
teacher judgment, and early childhood tests, for example. We have also, 
attempted to examine a wide range of criterion measures and outcorop times. 



/ 

- 70 



-60- 



Becau^e, meta-analysis is a new'approach and because we have applied it 
in several ways simultaneously, findings of our analysis must be viewed 
cautiously. One way in which we exercised caution was to compar§ and contrast 
the meta-analysis findings with the results from our re-analysis of the p f 

HSPV/FT data. Where results are similar (for example, those on the pre- < ^ 
dictive validity of background variables), we are fairly confident of the t 
meta-analysis findings. But where the meta-analysis produced .results at* 
odds with the HSPV/FT data, we are ;nore skeptical. For example, when we 
found prekindergarten tests more highly correlated than kindergarten or 
first-grade tests with later outcomes, we beganj looking for alternative 
explanations. Likewise, although teacher judgment shows promise as a 

* v 

predictor of later achievement, our conclusions in this regard must be 
tentative because teacher judgment was not included in the HSPV/FT data. 

Another reason for caution is the complex Way in which study qjfiar^cter- 
i sties appear to influence study outcomes. Glass argues that the influence 
of such characteristics as sample size can be ignored if the correlation * 
between # the characteristic and the magnitude of the relationship is near 
zero; he does not discuss at length what to def if a relationship is not 
near zero. When we looked at the relationships between study characteristics 
and magnitude of r ? s, we found some large and some counterintuitive results. 
Table 20 presents "correlations between four study characteristics — attrition 
rate, predictor and outcome reliability, and sample size — and the size of 
r's reported in these studies. 

Thi^ee relationships are, statistically significant and two are fairly 

substantial. Attrition rate (which is measured by percentage of subjects 

t 

* missing for later measurement) has a strong relationship (0.29) with correlation 



ERIC * 71 



-61- 



Table 20: ' Correlations Between Magnitude s>? Pearson 1 s r and 
Selected" Study Characteristics for Studies Reporting 
Relationships Between Early Childh6od Tests and 
Reading, Math, and Language Arts Achievement"" 



Characteristic 



Mean 



Standard Deviation Cases Correlation 



Attrition Rate 22.6 

Predictor Reliability SO. 3 

Outcome Reliability 86.0" 

A 

Number of Subjects 170.0 



20.0 
12.4 
' 5.0 
256.9 



392 
404 
556 
624 



.29* 
.31*. 
.03 
.10* 



*p <.05 



% 
I 



ERIC 



72 



\ 

\ 



size, but it is in the unexpected directipn (that is, attrition and the size 
of the correlation are directly rather than inversely related). Unless 
attrition is random, we would expect correlations to decrease with higher 
attrition rates because the sample becomes .more homogeneous. Here we find 
the opposite relationship. Predictor reliability is in the expected direction, 
but -its relationship with the size of the r's is surprisingly high (0.31)., The 
relationship between outcome reliability and correlation size is essentially zero 
probably because of , the' low varability in outcome reliability in our sample. . 
: Sample size and magnitude of.r's' are weakly related with studies Wing larger 
samples, with large samples tending to produce slightly higher correlations. 

It is difficult to know what to make of these relationships. It seems 
"likely that complex interactions among studies are at work. For example, 
it is possible that many, studies of prekindergarteners are done by universities 
and research organizations; and, for that reason, perhaps are better con-,, 
trolled. Predictive validity studies of reading readiness tests given to' 
kindergarteners and first-graders may be done more often by school districts; 
and max be less well controlled. With less well controlled studies, one 
would' expect reduced correlations between predictors and outcome measures.. 
To try to unravel these complexities is a substantial task requiring time 1 
and resources beyond thos^e" available to us. Hence our concern that' the re- 
sults of our meta-analysis be viewed with caution. 
Summary of the Findings from the Meta-Analysis 

Overall, our meta-analysis of studies supports the conclusions we reached 
' from the re-analyzing of the HSPV/FT data set. 

• We found' that background variables correlate weakly with later 

educational outcomes, but these correlations do not seem to decrease 
as the time between prediction and outcome measures increases. 



73 



' -63- 



* 

Some individual early childhood tests predict later outcomes fairly 
* well; and some tests do' better than others. Reading readiness tests 
appear to dp best; non-cognitive tests seem to do worst. 

e Both analyses revealed significant relationships between predictive 
power and time between measurement points, but these relationships 
appeared to be more complex in. the meta-analysis data than m the 
HSPV/FT data Probably this is due to interactions between study 
characteristics and the relation of time to predictive power. 

The meta-analysis data permitted us to examine teacher judgment as 
a predictor of later educational ou/comes . Our tentative findings 
are that teacher judgment does tfeaTly as veil as tests and that its 
predictive power seems not to decline over time to the extent test 
results do. 

CONCLUSIONS 

Our aim in this paper has been to inform discussion of ECT- 1 ! selection 
policy, and our main, audience" has been people at the federal level who think 
about, folate, promulgate, and monitor such policy. Our study has b<een 
limited to a discussion of the predictive validity of early childhood variables • 
that, in some combination, could make up part of local ECT-I selection strategies 
The study has also been limited to the data at hand — data which were collected 
originally for other purposes. Jn this last section, we will^.mention briefly 
some considerations besides predictive", validity that should be taken into 
account in a complete examination of ECT-I selection policy. Then we will 
discuss the implications from our findings for ECT-I selection -policy. 

. Some Other Considerations « 
, ■ w 

This paper has assessed the predictive validity of some early childhood 
variables that eould be part of an ECT-I selection process! Of course, there > 
are other criteria for judging selection variables and for assessing the over- , 
all process by which young children are chosen to receive ECT-I services. This 
subsection will briefly discuss some important considerations that we have ' 
not discussed in this paper. 



ERIC 



-64- 



Other .Aspects of Choosing ECT-I Children . Determining who receives ECT-I 

services is^ three-staged process: Title I attendance areas are specified 

(on economic grounds), a pool of eligible children residing in the attendance 

area is identified (based on educational need), and the neediest children are 

.selected from that pool. We have examined only aspects of the last stage, 

selection. But the other stages need to be considered in anv overall dis- 

cussion of ECT-l identification and selection. Tne second stage, identification, 

is particularly probleminatic for ECT-I. Identification for Title I programs. 

aimed at children in grades 2-12 is made easier because almost all potentially 

eligible children are in school and available for identification. Children x 

are not so readily available for ECT-I identification since ECT-I programs 

often provide the first school experiences for educationally disadvantaged 

children. Some work has been done or. the problem of identifying young children 

in need of services (see Hauser-Cram. Note 2; Yurchak, Note 3), and further 

consideration seems warranted. 

Costs of Selection Procedures , '.re have said little in this paper about 

the monetary and non-monetary costs of ECT-I selection, which are obvious 

concerns in assessing any selection procedure. We have seen that several 

variables together usually predict later outcomes more accurately than one 

variable, for example, a test score. But using multiple measures such as a 

combination of test scores, background variables, and teacher judgment to 

select ECT-I children may be expensive, using resources that might be better 

spent serving those children who are selected. Moreover, multiple measures 

can be a burden on teachers, children, and parents. Giving several tests, 

collecting background information ^ar.d judging children's readiness^ for 

* > 

school can b'e laborious for teachers and, worse, ^an take time from in- 
struction. Providing .detailed information about their children Can be 

) 

» 75 • . . .. 



annoying or threatening t'o parents.' Data collection, especially test taking, 
.can be boring, confusing, .or "threatening for children. * Such costs should 
be assessed in judgSig ^alternative selection procedures. 

■ Problems in Assessing Yo\ing Children . In judging variables that could 
makerup an ECT-I selection procedure, one must continually keep in mind the 
special problems in assessing young children. In another report from Huron's 
study of ECT-I evaluations, Haney and Gelberg (1980) not only point out that 
"tests and insts6me,nts for use with young children 'are generally of lower 
technical ausflity than those for use with older children" (p. 7) but also 
argue tha/ preschool' children ofjen lack the physical, intellectual, ar.d 
e-otionja prerecuisits necessary for syste-atic as-Sess-er.t. Giver, these 
soeciai difficulties it may ir.ake sense/ especial i> .her. selecting rrekir.cer- 

* o 

t 

earteners of children who have had no school or preschool experience, to 
ernphasi:e variables that are. not so dependent or, obtaining direct information 
from young children in stjange situations -- variables such as far.iiy 
characteristics, teacher judgment, and sibling information. ' 
' ' Selection Bias . >fuch of the djsctfssioh about ; bi as against 'minority groups 
in the literature dwells on the misuse of .standardic-ed tests . leading to the 
misclassification of children. (For 'example, see Mercer, L975) . But such 
discussions could be broadened' to other variables - of a selection strategy", 
Haney and Kinyanjui (1979), who aim their discussion at tests , -provide a 
useful perspective on bias. They argue that tests are not usually biased 
but the use of tejsts may be. By attention /the- -component's of an ECT-1 
selection procedure (which might include test scores, background variables, 
"and teacher judgment, for. example) probably would not be .intensionally biased 
for or against minorities.- However, the use of the strategy might be biase'd. 



-66- 



Lon 



Haney and Kinyanjui argue'that "a test is biased if, when it is used to make 
decisions or inferences about a person or group, those 'decisions or inferences 
are less valid than those made wh^n it is used analagously with people 
generally" (p. 5). Their definition of bias rests on the validity of the 
decision or inferences resulting from the test. Similarly an ECT-I selectic 
procedure might be said to be biased if the selection decisions for some 
groups are less valid than decisions for all children. Furthermore, the 
predictive accuracy of a procedure is one criterion in terms of which to . , 
judge its bias. That is, if a variable, a set of variables, or a selection 
procedure has lower predictive validity for some group than for others, it 
might be viewed as biased. 
Implications for ECT-I Selection Policy ° 

Importance of Predictive Validity . The importance of prediction steins 
from the central goal of most £CT-I programs; fch^ prevention of educational 
problems in later schooling. This goal, together with the requirement of 
selecting the neediest children', suggests that selection be based at least 
in part on which children are most likely to experience later educational 
disadvantage. Thus, we have argued and tried to demonstrate that the pre- 
dictive' validity of background variables, teacher judgment, and test scores 
irs an important consideration. 

The implication here is that local program staff should examine the 
predictive validity of their selection method. t This means more than just 
looking up a validity index for the tests they use. It means studying the 
predictive.validity of their procedures* in terms of their unique population 
of children, since similar procedures can produce varying results with 
different samples, .as we saw in the HSPV/y re-analysis and the Shipman 



ERIC 



7 



1 V 

i 

data. Another reason for encouraging local staffs to examine the predictive 

validity of their' selection methods is that definitions of educational 

disadvantage differ* from community to copnunity. Some LEAs are most con- 

cerned. about preventing future reading problems; others want to help children 

at-risk become better prepared to achieve later school success in general. 

Moreover, some LEAs empnasize standardized test scores as measures of Jatfer 

school success; others are moreMnterested in grades or students attitudes 

toward school. Although we did not find striking differences in how well 

early childhood variables predict different outcomes, in some cases, the 

composition of a set of variables or the Weighting- of the variables that go 

■into a selection strategy may "differ depending upon how a district chooses 

to define educational disadvantage in iater grades. 

Useful Statistical Procedures . Fortunately, some statistical tools 

J ' 

exist to help local staffs assess the predictive validity of their selection 

2 

procedures. We have described two in this paper: examining R (and incre- 
ments to R 2 ) and examining misclassification rates. Using the latter seems 



to be a particularly fruitful approach to assessing different strategies; 
it makes explicit the errors and the successes of any strategy and can help 
the staff focus their attention on the costs and benefits to the children 
they select and do not select. 

Importance of Longitudinal Data . Clearly, data collected over time are 
needed to assess predictive validity, -All the data we used were longitudinal. 
The HPSV/FT data set and the Shipman data followed some children from pre- 
kindergarten through third grade. Studies included in the meta-analysis 
measured- children as early as two years before kindergarten and- as late as 
sixth grade. The shortest time span of studies included in the -meta-analysis* 
was six months — fall of first grade to spring of first grade. 



( U 



-68- 



Jo judge the predictive validity of a selection strategy, data are needed 
at the time of selection and at a future time when some important event will 

^take place (such as success or failure in third-grade reading) . Many districts 
have neither the capacity nor the expertise to collect and analyze longitudinal 
data. However, given the importance of these data'not only for^studying 

* selection but for evaluating programs, help should be provided to LEAs *to < 
enhance their capacity in this respect. Some of this help might come from 
Title I technical Assistance Centers. Additional fielp migh^ come from con- t 
sortia of LEAs having ECT-I programs. These districts could band together 
to share computer facilities, analytic strategies, and even data. 
~ Some LEAs have the capability to collect and analyze longitudinal data 
but may lack the resources needed to apply the data to a topic like local • ♦ • 

ECT-I selection. One large district we visited had extensive longitudinal 
files tracing Follow Through children for several years after the program., 
Some files held test scores and background information; other files held 

student identification. The only problem was a lack of resources to merge 

1 

these files and apply them to early childhood selection and evaluation 
questions. Given the importance of a predictive perspective and the dif- 
ficulty in collecting and analyzing longitudinal 1 data, encouragement and 
support fro^JSED for collecting and analyzing such data seem warranted. ^ 
Select i^^nstrume nts and „ Variables . Both the re-analysis of the HSPV/FT 

^ 1 7 ' 

data and theTneta-analysis^suggest instruments and variables that have roles 
to play in selecting ECT-I children; From the standpoint of prediction, early 
childhood tests seem to have a place in selection strategies. Some instruments, 
such as the WRAT and the PSI, appear to be fairly accurate^r£ciictors of later 
achievement. Others seem to have less accuracy. Thus t^e predictive 
validity of 'a test should be considered in -deciding whether j to use it, 



-69- 



i 

^Lthou^h some tests Jiave reasonably high predictive validity, we have 
seen that adding other variables and assessment jnethods to test results can 
improve the accuracy of a selection proe^clure. One set of such variables is 
SES-related measures, which usually improve prediction of later achievement ^ 
and^do not seem to diminish in predictive power over time spans £or which 
we had data. In addition tojj^Rple background variables such as^ income 
and parents' educatiop^some ECT^a programs may have access to or the capacity 
to collect more sophisticated yariables such as measures of parent-child 
interaction. We have little data on the usefulness of such variables for 
ECT-I selection, but local- efforts to collect and assess such information^ 
warrant some encouragement and support, ^ 

Teacher judgment also has a place in ECT-I selection. Tentative findings 
froio our meta-analysis show that teacher judgment may do as well as 
early childhood tests in predicting later achievement. As we noted earlie'r, 
teacher judgment may be particularly useful in selecting very young children, 
whose lack of skills and school experience reduce the reliability and validity 
of tests. Unfortunately, we had no data Jthat allowed us to examine how much 

9 

teacher judgment would add to the predictive power of tests and' background 
variables. In addition, there seems to be e 'dearth of^data on teachers' 
ability to assess prekindergarteners .« Clearly, here is an area for fufther 
research, some of which could be carried on by LEXs with or without TAC 
assistance. 

•a * 

Finally, tfreQre are several potentially fruitful instruments and variables 
for which we have no information and for which further investigation is 
warranted. We had no data and found no studies that used early -childhood 
criterion-referenced tests to predict later outcomes, although this use for 



er|c 80 



-70- 



* CRTs has be|i* suggested (for example, Stenner, et al., 1976). Parental 

judgment is s another area for which we have no data, but where some further 

« 

investigation may be warranted (see Johansson, 196S, -for a s$udy of parental, 
( judgment in Sweden). In our descriptive study of ECT-I programs, we found 
several LEAs using information about siblings and about language proficiency 
to select children. Again we cannot" comment on these approaches except to ^ 
say that they deserve examination. 



c 



s 



9 

ERIC 



81 • 




3? 



Hauser-Cramm, <P. Research synthesis: Virtues and vulnerabilities^*^ 
Unpiibl 1 s1ied--paper , Kennedy School of Govierrtment and Harvar4 Graduate 
School of Education ^ 1980. 1 

QR!hm;*P. Developmental screening: 
key issue^ ^Report" ift preparatic 
MA, 1979. >\ 

Yurchak, Hairy Jane"/ Identifying ancj selecting children for early 
^childhood Title r\pfpgrams . Paper in preparation for the Huron Institute, 
Cambridge, MA, forthcomiYif, 



Hauser 



rev^qw anc * analysis of 




on for the Hb^j*<[n^titute, Cambridge, 



U 



8° 



REFERENCES 



Becker W.-G* Teaching heading and language to the disadvantaged . Harvard 
Educational Review , November 1977, £7, 518-543. 

Bryant, E.C., Glaser, E., Hansen, M.H., 5 Kir^ch, A. Associations between 
k educational outcomes and background variables: A review of selected 
literature: Denver, CO: National Assessment of Educational Progress, 4974* 

Byros, O.K. (Ed.'). ' The se^nth mental measurements yearbook . Highland Pafk, * 
: N.J. : The Gryphon* Press, 1972. >y < 

Cohen, J. $ Cohen,' P. Applied multiple regression/correl aj|^fr an alysis for * 
the behavioral sciences . Hillsdale, N.J.: Lawrence Eribaum Associates , 
, Publishers, 1975. . 

» 

Eysenck, H.J. An exercise in mega-silliness. American Psychologist , 1978, 
55' , 517. . 

• 

Fisher, R.A. Frequency distribution of the values of the correlation coefficient 
in samples *from an indefinitely large population, Biometrika , 1915, 1J3, 
507-521. 

Gallo, P.S. Meta-analysis--A mixed meta-phor? American Psychologist , 1978, 
55 , 515-517. «* 

Glass, G.V. Primary, secondary, and meta-analysis of research. Educational 
Researcher, 1976, 5, 3-8. ' , y 



Glass; G.W Integrating findings:* The meta-analysis of research. In'L.S 
Shulnian (EdJ ^Review of research in education (Vol. 5). Itasca, 111.: 
F.E. Peacock Publishers, 1977. — — — 



Haney, W.M.. The Follow Through planned variation experiment. Vol 5: A 

technical history of the national Follow Through evaluation' . '"Cambridge , 
W: The Huron Institute, 1977. 11 ' I \ ^ 

Haney, W.M., §, Gelberg, J.W. Assessment in early childhood education . Cambridge 
MA: The Huron, Institute, 1980- ** 

% v 

Haney, ' W.M-. , 5 Kinyanjui, K, § Competency testing and equal educational oppor- 
tunity. IRCD Bulletin , *1979, 14, 1-11. * 

Jackson, G.B. Methods for integrating reviews. Review of Educational Research , 
1980, 50, 458-460. •. ' ' 

Johansson, B.A. Criteria of 'school readiness: Factor structure , predict jrve 
value, and environmental influences . Uppsala: Almquist and WikseXls 
Boktryckeri'AB, 1975* . / 



-73- 



( 



McNeroar, Q. Psychological statistics (4tn ed.). New York: Wiley, 1969. 

— : ~. . < 

Mercer, J.R. Psychological assessment and the rights of children. In X. 
Hobbs (Ed.) Issues in the classification of children (Vol. 1). San 
Francisco: Jossey-Bass„ Inc. , 'Publishers , 1975. 

Molitor, J. A/., WatkJhs, M." 6 Napio'r, D. Educatidn as experimentatio n:- A 
planned variation model. The non-Follow Through study . Cambridge, 
. MAi. Abt Associates, Inc., 1977. _ • v 

Kie N H- Hull, C.H., .Jenkins, J. G'. , .JSLteinbreJnner , K. , § Bent, D.H. Statistical 
' package for the social sciences (2nd Edison), New York: McGraw-Hill, 
1975 

RoWh€hal* R. , s'R^in, B.C.* -Interpersonal expectancy effects: The first 
"545 studies » . Thi^Behavioral and Brain Sciences , 1978, 3, 37/ -415. . 

Shipmin, 4.£,,^cKee,..D.; & Bridgeman, B. Stability and change in family 
^status situational, and process variables and the ir relationship to. 
children's cogniti ve performance. Disadvantaged- children and their 
first , school ' : experiences .: ETS Head Start Longitudinal Study, September, 
M976.' 261 'pages. ERIC Reproduction Number ED .1383*9. ^ 

Stenner,' A,J., Feifs/l.A. , Gabriel, B.R. , & Davis, B.S Evaluation of the 
ESEA Title I prferam of the public schools of the District of Columbia , 
j - 1975-76. — Washington , D.C,:' The Public Schools of the District of 
Columbia, no dfete. ' * • • 

Subkoviak, M.J. Decision-consistency approaches... In R.A. Berk (Ed ) Criterion- 
referenced measures: The state of the ar» . The Johns Hopkins University 
Press, 1980. • 

Walker, D.K., Bane, M.J.-, * Bryk,* A.S. The quality of the Head Start planned 

variation data (2- Vols.). Cambridge, MA: Tfce Huron Institute, 19,3. j 

Weisberg, H.I.., 5 Haney, W.M, Longitudinal evaluation of Hea d Start planned 
variation and Follow Through . Cambridge, MA: The Huron Institute, 19/7. 

White K R. The relationship between socioeconomic s tatus and academic 
achievement . .(Doctoral dissertation, Univ. of Colorado, 1976). 

Yurchak, M.J., 9 Bryk, A.S. , ESEA Title I early childhoo^ education: A 
• descriptive report . Cambridge, MA? The Huron Institute* 1980. 

Yurchak" M.J. , Gelberg, X^> S Darman, L. Description of early childh ood 
Title I programs" Cambridge; "MAT The -Huron-ins t i tut e , m 



ERIC . : 



